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FOREWORD 


Tue FIELD of psychological testing, including personality and character 
study, is being vigorously developed at the present time. So much material 
was submitted by the Committee for this issue that it had to be greatly re- 
duced in order to keep within allowable limits. Both the text and the 
bibliographies were heavily cut. The Committee members should therefore 
receive credit for an even greater amount of work than is apparent from the 
material which is printed herewith. An effort was made not to eliminate any 
significant point in the original manuscript, or to distort the emphasis. In 
order to keep the issue within its present expanded size, however, rigorous 
economies had to be exercised. 
Douctas E. Scares, 
Chairman of the Editorial Board. 





INTRODUCTION 


To OBTAIN ARTICLES for this issue Psychological Abstracts from January 
1935 to December 1937 were scanned. Abstracts bearing on psychological 
tests were cut out, sorted according to the chapter headings agreed on by 
the Committee, and sent to members. Each member also combed the litera- 
ture for his section. It is believed that this method has reduced to a mini- 
mum the overlapping which might ordinarily be found between chapters. 

The references in the bibliography, carefully selected from many more, 
testify to the enormous amount of investigation which is going on in the 
field of psychological tests. 

The organization of chapters has been altered since the June 1935 issue 
of the Review devoted to “Psychological Tests.” A special chapter on tests 
with infants has been included. Another chapter, on the applications of 
tests of non-intellectual functions, appears for the first time. 


PercivaL M. Symonps, Chairman, 
Committee on Psychological Tests. 





CHAPTER I 


Review and Preview 


PERCIVAL M. SYMONDS 


This BRIEF CHAPTER endeavors to present a bird’s-eye view of the more 
important trends in the work on psychological tests. The writer also ventures 
to make predictions of future developments and to point out what seem to 
be the more promising fields for research on psychological tests. 

The construction of new tests of intelligence has slackened perceptibly 
during the past three years. A decade ago a review of a three-year period 
would have disclosed widespread activity in the construction of new tests of 
all kinds of mental functions but the present review describes only a handful. 
The outstanding new test is the Revised Stanford-Binet. Interview methods 
of estimating intelligence, and intelligence tests which may be administered 
orally, have been explored. Presentday test construction represents a refine- 
ment of procedures, careful standardization, and detailed statistical analysis. 
We are in the period of detailed quantitative and qualitative analyses of 
individual items on tests. 

No important new trends have developed in the past three years. Much 
work has been done, however, in the employment of intelligence tests in a 
wide variety of studies seeking to discover the relationship of intelligence 
to every conceivable human function. Intelligence tests continue to be used 
for their possible value in connection with evaluating or predicting school 
progress. They have also received considerable study in connection with 
occupational status, and in the attempt to gain a better understanding of 
abnormal and pathological mental states. 

A decade ago studies on the constancy of the I.Q. were interpreted as 
leading to the conclusion that the 1.Q. was invariant and presumably a 
native function. The recent work of Newman, Holzinger, and Freeman on 
identical twins, the work of Wellman on nursery school children in Iowa, 
and the work of Skeels on children in foster homes—all reveal the possi- 
bility of marked changes in I.Q. These findings call for a rethinking of the 
problem of nature and nurture, which will undoubtedly receive consider- 
able attention within the next few years. 

The influence of counseling procedures in high school and college will 
undoubtedly be a topic for future study and discussion. There is much to be 
learned concerning discrepancies between intelligence and school per- 
formance. Studies to date have not revealed that there is any group of 
factors responsible for these discrepancies. Future work will probably pro- 
ceed to the more intimate study of individual cases. The use of intelligence 
test results in the guidance of high-school and college students toward their 
future educational and vocational careers still is uncertain and more work 
will have to be done before established procedures are arrived at. 
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One gains an impression that more is known about intelligence than is 
being used at the present time. There is room for discussion of social 
action with regard to problems of eugenics, cultural influences affecting 
intelligence, and the adaptation of school and other social institutions to- 
ward a more effective development of intelligence. 

What is said concerning intelligence tests is also true of aptitude tests. 
The three-year period under review has not seen the development of many 
new aptitude tests, but it has witnessed great activity in the study of exist- 
ing tests for their value in vocational guidance and the prediction of 
vocational performance and success. Existing aptitude tests have been sub- 
jected to the new analytic tool of the period—factor analysis. Previously. 
work on aptitudes had to do largely with such general composites as 
mechanical aptitude, clerical aptitude, medical aptitude, and the like; and 
whereas interest in these more general abilities has continued, there is 
evidence of growing interest in more specialized abilities. In particular, the 
period has witnessed much activity in the study of automobile driving 
and the characteristics of successful aviators. There is a beginning in 
the development of specialized tests in such professions as law, medicine. 
dentistry, teaching, engineering, and nursing. 

The period has also seen the beginning of attempts to study the per- 
sonality characteristics which condition success in various occupations, 
recognizing that the problem of occupational adjustment is wider than the 
abilities involved. Attempts to predict teaching success through aptitude 
tests have been disappointing, and personality tests of the questionnaire type 
do not prove to be much more valuable. Along with the trend of the times, 
there is evidence that the study of occupational adaptation must proceed 
to a more individual and clinical basis before we will know exactly what 
factors are involved, and there are indications that this will more and 
more characterize research for vocational fitness and success in the future. 

There has been tremendous activity in the development of new methods of 
personality study. The period under review might be called the heyday of 
the exploration of the possibilities of the questionnaire. The groundwork 
which was laid in the preceding fifteen years has come to its climax and 
it is doubtful if such interest in the questionnaire will be maintained in 
the years to come. A wide variety of methods and approaches has been 
employed, many have proved fruitless, but undoubtedly a small residue of 
useful methods will remain as the permanent contribution to this work. 
Watson, in his review of adjustment questionnaires, comments: “It is 
probable that reports of unhappiness are less contaminated with pretense 
than are reports of happiness.” This statement probably applies to person- 
ality questionnaires of all kinds whether of adjustment, interest, or atti- 
tude, and indicates that questionnaires showing a high degree of unfavor- 
able response have some value in identifying persons who need further 
study. There is an enormous number of articles devoted to the development 
and application of attitude measurement. 
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Not only have the conventional methods of studying personality been 
developed further but there has been much activity in exploring less 
familiar procedures. The Rorschach method must be considered now as 
having a recognized place in the study of personality in American psy- 
chology. A number of studies have used the matching method for the 
evaluation of some of the analytical and undifferentiated indexes of person- 
ality. Measures of perseveration and perception have received some ex- 
ploration. 

It is the writer’s judgment that methods for the study of personality 
based on measurement have hardly fulfilled the expectation originally 
held for them on the basis of the success of measures of intelligence. One 
can note a rise in more qualitative and analytical methods of personality 
exploration. The Rorschach method, the various methods explored by 
Murray in the Harvard Psychological Clinic, the variations in the use of 
free association, and the use of so-called “projective technics” in the ob- 
servation of children’s use of play materials, drawings, dramatic activity, 
and the like, are all indications of the direction in which the exploration 
of personality is now inoving. These methods depend on the insight and 
experience of the observer, rather than upon a blind computing of scores. 
It is possible that the genius of American psychology which is to quantify 
and systematize psychological methods may show how to make these 
analytical methods more objective. 

The period under review is also characterized by exploration of the 
value and significance of personality tests. Scores of studies have used the 
Bernreuter and Thurstone personality schedules—indiscriminately, it is 
believed—particularly in view of the fact that the validity of such measures 
has never been definitely established. That much of this work is proceeding 
blindly rather than on the basis of insightful theory and promising hypoth- 
eses is testified to by the large number of studies in which little or no 
relationship is found. 

Work with tests of social attitude is more promising. Much has been done 
in the applications of the Thurstone Attitude scales. The beginnings of 
studies on the formation of attitudes are noted, but there is room for a wide 
variety of experimentation on the formation and changes in social attitudes. 

There has been surprisingly little research on the use of interest meas- 
ures, in view of their potential value in guidance. There is much profitable 
work to be done in this area; technics are readily available. 

The advancement of statistical technics during this period is astounding. 
Many will be surprised to learn that the traditional Pearsonian statistics, 
which have been the stock-in-trade in scores of statistics texts for the past 
thirty years, are due for a decided transformation. Curiously enough, the 
first satisfactory popular statistics textbook employing the new Fisher prin- 
ciples has yet to appear. The comment made by Cureton in his section, to 
the effect that “the more general use of modern sampling theory and 
exact tests of significance will require much careful expository work to 
make the technics more generally available and may in fact imply no 
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less than a total abandonment of the idea of non-mathematical elementary 
statistics,” is significant. Does this point toward the probability that statis. 
tics is to become a technical tool to be used only by the experts, and that the 
dream of having all psychology and education students master elementar) 
statistics in order to handle their test results and experimental procedures 
will have to be abandoned? Cureton has made such an excellent summary 
and recommendation of future research that it is not necessary to elaborate 
these ideas at this point. 

The construction of a simplified test-scoring machine and the develop- 
ment of simplified methods of handling mass data point to the possibility of 
a more widespread use of psychological and educational tests. The merging 
of psychophysical theory and the theory of test construction points to- 
ward a better established theoretical basis for measurement procedures. 
Finally, one must commend the industry which has been shown in develop- 
ing, refining, and applying theories of factor analysis. The writer must 
admit that he leans toward the interpretation given by Thomson and 
Tryon, that there is no guarantee that factor analyses of batteries of tests 
represent psychological realities. It will probably be agreed that this work 
is in a highly experimental state and that considerable further work is 
destined to be done before we are certain as to the real value of the 
methods. 

Following is a list of the studies which have received special praise 
in the chapters of this Review: 9, 37, 43, 52, 54, 55, 114, 171, 215, 253, 301, 
331, 346, 362, 440, 573, 647, 691, 778, 818, 870, 918, 919, 920, 949, 950, 
958, 980. This list may be thought of as an honor roll of the studies reviewed 
in this issue. 





CHAPTER II 
Intelligence Tests’ 


PSYCHE CATTELL 


Individual Tests of General Intelligence 


Prosasy the most important contribution to the field of intelligence 
tests during the last three years is the New Revision of the Stanford-Binet 
by Terman and Merrill (55). Ten years of work have been put into a 
thorough and careful revision and restandardization. The range has been 
increased both upward and downward and now extends from two to 
twenty years. The test contains a richer sampling of abilities than the old 
scale. Using the ninety items of the old Stanford-Binet as a starting point, 
the authors developed provisional scales having 408 items which were 
selected from “a study of every kind of intelligence test item that has been 
used or suggested” or that could be devised by the authors. From these 
provisional scales, the authors developed two forms of the test, each con- 
taining 128 items. An unusual effort was made to secure a representative 
sample of the white population of the United States for the standardization. 
Over 3,000 subjects were used from seventeen different communities in 
eleven states. One hundred cases were examined at each age group between 
the ages of two and five and between fifteen and twenty inclusive, and 
two hundred cases at each of the intervening ages. In each age group an 
equal number of boys and girls were tested and each child tested was given 
all the items in the provisional forms appropriate to his mental age. 

The index of reliability was found to vary with the intelligence quotient 
rather than with the chronological or mental age, but was high at all 
levels. The test is of too recent date for reports regarding its practical use 
in the field to have been published. The present writer has used it extensively 
with preschool children between the ages of three and six years and has 
found it markedly superior to the old form and to any other intelligence 
test available for these ages. 

The only article concerning the New Stanford-Binet that has appeared 
is one by Mayer (39) dealing with negative reactions aroused by the 
tests in a group of 277 children between the ages of eighteen and sixty-six 
months. Mayer’s data showed the most frequent negative responses at three 
years, though the amount is fairly constant between the ages of two and 
four and one-half years. After four and one-half years, there is a marked 
drop. Two of the items which required verbal responses to verbal stimuli 
aroused negative reactions in over 30 percent of the children, while items 
involving objects and manual performance aroused very little or none. It 
might be added that the amount of antagonism aroused in the child by any 
test is closely related to the skill of the examiner. 


1 Bibliography for this chapter begins on page 318. 
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Another recently published individual battery of tests is the Detroit Tests 
of Learning Aptitude (5). An examination of the test material and pupils’ 
record booklet gives the impression of a test carefully thought out and 
well put together. The directions for scoring and administering are adequate 
and clear, but the discussion of the scale, its general use, validity, reliability, 
and the construction of the table of norms is disappointing. All that can be 
determined from the manual regarding the number of cases on which the 
subtests are standardized is that it is probably under fifty. The test as a 
whole was not standardized, the total mental age being the median of the 
mental ages on the subtests. Intercorrelations were obtained for sixteen of 
the tests. No reason is given for the omission of the other three. No table 
of the coefficients is given, and no reliabilities are reported except for one 
subtest. It is claimed that the results of this test “are very closely com- 
parable with those of the Stanford-Binet.” This statement appears to be 
based on the fact that forty-five retarded children in special classes ob- 
tained 1.Q.’s averaging 4.6 points lower than they had on the Stanford- 
Binet two years previously. Worcester and Corey (68), in a review of this 
test, pointed out other weaknesses. 

A translation of the revised and extended Biihler and Hetzer tests ap- 
peared in 1935 (9). These are treated in Chapter III. 

Maizlish (37) revised Snedden’s intelligence test (49) which is based 
on a vocabulary test given in the disguised form of a hereditary question- 
naire in which the subject is asked which of his two parents possessed the 
greater amount of seventy-five traits, such as being meticulous, sanguine, 
gentle, etc. With the purpose of making the interview appear more natural 
and at the same time of obtaining more information regarding the subject 
tested, Maizlish changed the test into a likes and dislikes questionnaire. 
“The subject was asked to cooperate in a study which aimed to determine 
why we like and dislike certain people.” He was asked to tell whether he 
liked or disliked a person characterized by the given word and then to 
give his reasons for his likes or dislikes. The latter part of the question 
was introduced mainly to determine whether or not the subject was guess- 
ing at the meaning of the words, but the reasons given by the subject for 
his likes and dislikes also served to throw light on his personality and atti- 
tudes. Snedden’s form of the test correlated only .16+.13 with the Kuhl- 
mann-Anderson Intelligence Tests while the revised form yielded .77+.04 
as an individual test and .50 when given as a group test. 

Tuckman (65) constructed an intelligence test using as material cartoons 
taken from the weekly magazine The New Yorker. Seventeen cartoons were 
selected, each consisting of from two to twelve parts. The cartoons were cut 
apart and presented in random order to the child, who was asked to arrange 
them in the correct sequence. The test is an interesting novelty rather than a 
scientifically valid measuring scale. A reliability coefficient of .93 and 
validity coefficient of .85 was reported; based on a group of 114 cases 
with an age range of twelve years, they did not indicate a satisfactory in- 
telligence test. 
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Group Tests of General Intelligence 


The California Tests of Mental Maturity (38, 54) consist of four bat- 
teries, one for kindergarten and first grade, and the others for Grades I 
to III, IV to VIII, and VIII to XIV. Each battery consists of sixteen sub- 
tests. The total time required for the administration of the tests is about 
ninety minutes. The first three subtests are for the purpose of discovering 
any children who have visual, auditory, or motor handicaps sufficiently 
severe to interfere with their success on the remainder of the tests. The other 
thirteen tests give mental ages and intelligence quotients for what the 
authors term a “total mental factor,” a “language factor,” and a “non- 
verbal factor.” The non-verbal tests require long verbal instructions and 
appear to be almost as much verbal tests as the others. The first items of 
some of the subtests appear to be such that they would be likely to dis- 
courage a child of the lowest grades for which the battery is intended. As 
a whole, however, the tests appear to be excellent, and appear to have high 
reliabilities. For the pre-primary and primary scale, where high reliabili- 
ties are difficult to attain, the coefficients for the subtests run from .70 to 
.95, and for the total battery from .90 to .96. 

McCall, Herring, and Loftus (34), in their comprehensive tests, covered 
a wider field than the usual intelligence and achievement tests. The tests 
consist of four batteries: (a) an intelligence test; (b) an achievement 

(c) an educational background questionnaire; and (d) a school 
practice questionnaire. Each test contains from 100 to 150 items and 
covers Grades III or IV through IX. The intelligence tests consist of 150 
rows of five words or numbers each. The subject is asked to cross out the 
one word that does not belong with the others. The achievement tests cover 
a wide field of knowledge taken both from inside and outside of school. 
There are nineteen headings which include such captions as “enjoying life,” 
“buying and using things,” and “arts and crafts.” The educational back- 
ground questionnaire is a new type of test and is planned to supplement 
the intelligence tests as.a measure of educability. It is based on the assump- 
tion that ability to learn is in part dependent on cultural factors, and it 
is designed to supply information about the home and the community 
which will assist the teacher in understanding the child’s liabilities and 
assets. The school practice questionnaire is a test of the extent to which a 
school has the “characteristics of democratic activity.” 

The tests leave room for a number of minor improvements, but represent 
pioneer work and appear to have been carefully constructed and put to- 
gether. The indexes of reliability are high—that reported for the intelli- 
gence test is .97, for the achievement test, .96, and for educational back- 
ground, .91. The norms for the achievement and intelligence tests are 
based on 20,000 cases and for the educational background on 5,000. The 
batteries used in combination make possible the measurement of the broader 
aspects of intelligence and education that have often been neglected and 
should prove a valuable contribution to the measurement field. 
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The Otis Quick-Scoring Tests of Mental Ability (43) consist of three 
batteries. The Alpha Test ? is an entirely new test planned for Grades I to IV. 
The Beta Test for Grades IV to IX and the Gamma Test for High Schools 
and Colleges are revisions of the Otis Self-Administering Tests of Mental 
Ability and are similar to them in content. The Alpha Test is composed of 
ninety rows of four pictures each and the child is asked to mark the one 
that is not like the other three. Since this battery is made up of one type 
of item only, it is likely to prove more satisfactory when used in combina- 
tion with other tests than when used alone. Two forms have been published 
for each of the three batteries and others are planned for the future. The 
only instructions are the preliminary ones and the word “stop” at the end 
of twenty minutes in the primary test, and at the end of thirty minutes for 
the other two. The tests are thus simple to give and they may be quickly 
scored by means of the special answer sheet and the perforated scoring 
key provided. The reliability coefficients of the tests are rather low. Form 
A and B of the Alpha Test correlate only to the extent of .86 over a grade 
range of four years. The norms for the Beta Test are based on adequate 
numbers, but those for the other two batteries, labeled tentative, appear to 
be very inadequate. 

Stump (52, 53) is in the process of developing a group intelligence test 
to be presented orally. The subjectmatter is similar to that included in 
many group tests. A number of advantages are claimed, the most important 
being: avoiding of difference in score resulting from difference in speed 
of reading; opportunity for every subject to attempt every item; and 
considerable saving in expense through use of blank paper instead of 
printed booklets. There are two forms of the test planned to cover Grades 
IV to VIII. A correlation coefficient of .74 was found between this test and 
the Terman Group Test of Mental Ability. Mental age norms and reliability 
coefficients are promised for the near future. 

Garth (26), in a study of riddles as an intelligence test, gave a series of 
eighteen riddles to 143 college students. The reliability of the riddle test 
was high, namely .94, but the correlation with the Army Alpha Intelligence 
Test was only .54 and with Trabue’s Completion Test, .35. 

The Cattell tests (12), published in England, consist of four series of 
tests of two forms each: Scale O for mental ages four to eight; Scale I for 
mental ages eight to eleven; Scale II for mental ages eleven to fifteen; and, 
Scale III for mental ages over fifteen. Scale O is administered individually, 
and the other scales may be given either as individual or group tests. The 
tests are apparently excellent. The handbook, however, gives no information 
regarding the validity and reliability of the scales. Scales I and II were 
revised in 1936 (13). Tentative norms for Scale I, based on 500 “selected 
average” children, are given. 

2 The Alpha Test has been temporarily withdrawn from the market, but is expected to be reissued 


shortly in the same form, but with provisions for breaking up the long concentration period required 
of the pupils. 


224 





June 1938 INTELLIGENCE TESTS 


Thorndike, Woodyard, and Lorge (57, 58) published four new forms of 
the CAVD test for the college and high-school levels. The old form and 
each of the new ones were given to one hundred adults between the ages 
of twenty and seventy. Each item included in the new form is of known 
difficulty “in terms of the items of the original form.” The scoring has 
been made quicker, easier, and more objective than that of the original form. 

New revisions of the Army Alpha Intelligence Tests continue to appear. 
A recent revision by Schrammell and Brannan (47) consisted of three 
forms; and one by Bregman (7), of two forms. The authors of both revi- 
sions selected from the five forms used in the army those items most 
appropriate for general use, eliminating or modifying items which might 
not be appropriate for persons not connected with the army. The norms 
for the revisions of Schrammell and Brannan cover ages eleven to eighteen, 
and Grade VI through university graduate students. They may, therefore, 
be expected to be higher than norms for the general population. The 
revisions of Bregman have been equated to the original forms used in the 
army. The selection of cases and the method used are not made clear in the 
manual. Neither of the manuals of instruction offer any evidence regarding 
reliability and validity. 

Other group tests of general intelligence that have been recently pub- 
lished are Dawson’s Mental Test for ages eleven and twelve (16); the 
Moray House Tests for ages ten and eleven (56); the Junior School 
Grading Test for ages seven to nine (1); the Thanet Mental Tests for 
age eleven (2); the Orton Intelligence Test for ages ten to fourteen (42) ; 
Gibson’s Intelligence Tests for ages ten to thirteen (27); the Kelvin Meas- 
urement of Mental Ability for ages eight to fourteen (21) ; Smith’s Test for 
Grade VIII (48); and the Nebraska revision of the Army Alpha (28). 
Either revisions or new forms of the following tests have been put on the 
market: Detroit Beginning First Grade Intelligence Test (20) ; the Henmon- 
Nelson Tests of Mental Ability (30) ; Teachers College Psychological Test 
(35); Thorndike’s Intelligence Examination for College Entrance (59) ; 
Thurstone’s Psychological Examination for College Students (60, 61, 62) ; 
and Thurstone’s Psychological Examination for High School Students (63). 

Two studies, one by Benton (6) and the other by Ferguson (21), indi- 
cated that strong motivations on the part of pupils did not assist them in 
gaining higher scores on a group intelligence test. The studies both used an 
experimental and a control group, both offered, among other incentives, 
prizes to the experimental group for improvement in their second score over 
their first score, and both used Otis tests. In both instances a greater gain— 
statistically insignificant—was found in favor of the control group. 


Tests of Particular Mental Abilities 


A number of special types of tests have been described or published dur- 
ing the last three years. Wilson and Flemming (66) studied the relationship 
between perception as measured by various figures, letters, lines, etc., pre- 
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sented in an exposure apparatus, and various tests of reading. They used 
twenty-five first-grade children as subjects. The correlation between the 
various types of reading and perception tests averaged .22. 

Stuart (51), in a study of the development of reasoning ability in 1,400 
children between the ages of nine and eighteen years, applied a number of 
tests which included perception of forms, arithmetic reasoning, a standard 
group intelligence test, general information, ethical judgment tests, and 
others (ten in all). The intercorrelations for the various tests ranged from 
.06 to .70 and averaged in the low fifties. The author concluded that since 
motor control and the ability to handle concrete data correlate highly with 
the various types of mental ability tests, they should not be excluded from 
general intelligence tests. Chronological age, however, was apparently not 
held constant. It is well known that these abilities correlate highly with 
chronological age and, since the range is nine years, it cannot be determined 
from the data reported how much the coefficients are influenced by age. 

Jalota (32), in a study of the value of memory tests for use as tests of 
general intelligence, found a correlation of .49 between a memory and a 
group intelligence test. A test for scientific thinking with a high reliability 
coefficient devised by Downing (15) gave a reasonably consistent increase 
in score from Grades VIII to XII. The same pupils, however, showed an 
irregular increase from age to age, indicating that this ability was the result 
of learning rather than of innate intelligence. The author wrote that: “Gen- 
eral intelligence as expressed by the I.Q. is something quite different from 
the ability to handle either the elements or safeguards of scientific thinking.” 
Similarly, Edmiston (19) found that the ability to make generalizations 
from certain paragraphs just read could be increased by training. 

Gupta (29) devised a series of tests of reasoning ability based on a well- 
known story. The answers of fifty-five children are given in full and dis- 
cussed at length. The conclusions drawn appear to be based on subjective 
opinion rather than on scientific evidence. Wells and Hylan (67) modified 
and extended into a series some individual test items and administered 
them together with the Army Alpha Intelligence Test to small groups of 
students of superior ability. The items include, among others, ingenuity, 
reversed clocks, and inverted forms. The authors found that the initial score, 
or what is equivalent to the usual psychometric clinical test, may be reversed 
after a series of repetitions. In conclusion they wrote: “It cannot be too 
often or strongly insisted on that psychometric techniques cannot be inter- 
preted in isolation without risk of grave error.” 

Oden and Mayer (40) found the ball and field or plan of search test to 
be unsatisfactory below the superior plan. When the directions were worded 
in such a way as to make no demand for thought or for planning, almost 


60 percent of the responses were such that they would receive credit at the 
lower level. 
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Non-Verbal Tests of General Intelligence 


A number of non-verbal and performance tests of intelligence have been 
described or published during the last three years. Most of them are at- 
tempts to construct tests that can be used satisfactorily with subjects of 
varying cultures, who speak different languages. 

Leiter (33) constructed a test the administration of which requires neither 
verbal instructions nor pantomime. It is planned for use with subjects of 
all cultural and racial backgrounds and covers the age period from the pre- 
school child to the adult. The tests and procedures for administering them 
are described in detail. The standards are based on 1,400 cases and a re- 
liability coefficient of .91 + .03 obtained by the split-half method is re- 
ported. Another non-verbal intelligence test covering a wide age range is 
the Chicago Non-verbal Examination by Brown and his associates (8) 
which is planned for ages seven through adulthood. 

Penrose and Raven (44) held that tests of perception ability are the most 
satisfactory tests of educative ability which are uninfluenced hy special 
training or cultural background. The authors are now working on the stand- 
ardization of a finely graded series of perceptive tests, and plan to put out 
an intelligence test which is little affected by education or cultural back- 
ground and which they believe can be used with some modification for the 
deaf and the blind. Hildredth and Pintner (31) published a manual with 
explicit directions for administering and scoring a short form of the Pintner- 
Paterson Tests. The scale consists of nine of the performance tests included 
in the longer scale. 

Oliver (41) described briefly the validation and standardization of a 
non-verbal test planned for use among South Africans of similar cultures 
but speaking different dialects. The subtests included are similar to those 
commonly found in paper and pencil non-verbal tests. Drever and Collins 
(17, 18) brought out a second edition of their book on performance tests 
for deaf and normal children. New material has been added to the tests and 
additional data added to the norms. Verbal directions for “hearing” chil- 
dren have been added, but without revision of the norms. Arsenian (4) 
adopted Spearman’s visual perception test to pantomime directions in order 
that it might be used with non-English speaking people. The reliability of 
the test was found to be .88 + .01. Cunningham (14) described a non- 
verbal test of general intelligence which is being devised for Austrian chil- 
dren, and Maekawa and Yendo (36) a performance test for Japanese kin- 
dergarten children. The latter has been found to have a positive but low 
correlation with general intelligence. Among other performance tests that 
have been recently described may be mentioned the Ontario School Ability 
Examination (3), Garth’s Puzzle Box (25), the Quasha and Likert Revision 
of the Minnesota Paper Form Board (46), and the peg boards and single 
block models by Forbes (23, 24). 
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Directories 


Buros (10, 11) compiled two test directories, the first covering the years 
1933, 1934, and 1935, and the second covering the year 1936. The firsi 
edition listed practically all educational, intelligence, and personality tests 
published during the previous three years. They were conveniently classi- 
fied for ready reference. A publisher’s directory and an alphabetical index 
to the tests by title and by author were also included. The 1936 issue of the 
directory adds a list of recently published books on measurement. Each 
entry is followed by excerpts from reviews of the book. 

An index of periodical literature on testing by South (50), covering the 
years 1921 to 1936, lists 5,005 articles alphabetically by author. It will be 
found useful in locating articles, the authors of which are already known. 
For looking up references by subject it is less satisfactory. A subject index 
referring to the alphabetical list by number is given, but some of the head- 
ings are followed by fifty or a hundred reference numbers, most of which 
might have to be looked up before the desired article is found. Some head- 
ings, such as “differences,” and “diagnosis,” are so broad that they are of 
little value; before looking up the references one would like to know what 
kind of differences are referred to, and what is diagnosed. 





CHAPTER III 
Tests and Studies of Infants and Young Children’ 


METTA MAUND RUST 


Ths REVIEW is based on psychological measurements of children under 
six years of age. 


General Summaries 


Dewey (97) reviewed the literature on prenatal and postnatal activities 
through the second year of life with special emphasis on the growth process, 
presenting a bibliography of 216 titles. M. C. Jones and Burks (143) dis- 
cussed experimental studies dealing with personality, three-fourths of which 
are concerned with infants and children under six years of age. Pintner 
(180), in summarizing intelligence tests, and Maller (166), in summariz- 
ing character and personality tests, devoted considerable space to infant 
and preschool levels. Wenger (218) reviewed the investigations on condi- 
tioned responses in infants. Wenger and Williams (216) summarized stud- 
ies of learning in infants and preschool children. Meek and Jersild (170) 
covered mental development from two to twelve years, and Cattell (83) 
covered the period of infancy. Richards and Irwin (187) examined the lit- 
erature on plantar responses and gave a bibliography of 117 titles. 


Psychological Scales for Infants and Preschool Children 


Recent research on scales used for testing general mental development 
of infants and preschool children has consisted chiefly in revising, refining, 
and extending the age range of the scales which are now in use, rather than 
in devising new ones. 

Gesell’s norms of infant growth—Gesell and others’ volume (114) is or- 
ganized as a practical manual, alihough it combines some monograph fea- 
tures. A developmental examination, a cumulative behavior inventory, and 
physical measurements are provided for each lunar month interval through 
the fifty-sixth week. The examination follows the general procedure pre- 
sented in Gesell’s book, The Mental Growth of the Preschool Child, and the 
normative summaries of preschool development, taken from this volume, 
appear in the appendix. This earlier volume is now out of print and a re- 
vised handbook of procedures and norms is in preparation. 

Biihler’s tests—Biihler and Hetzer (81) revised and extended the age 
range of Biihler’s earlier test; formerly it was for infants only, covering 

1 Bibliography for this chapter begins on page 320. 


Nors: Studies of development, reported briefly in this chapter, will be covered more cqm- 
pletely in the issue scheduled for February 1939, Mental and Physical Development. 
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the period from two to twenty-four months. The revision extends through 
the sixth year. Some of the pictures included in the test materials are not 
appropriate for American children at the preschool level, and modern meth- 
ods of scaling were not used in constructing the scale. It appears to dif. 
ferentiate well between levels of development, however, and is valuable for 
sequential study of the individual child. The tests for each developmental 
level may be grouped into six categories as follows: sensory reception, 
bodily control, social behavior, learning, manipulation of materials, and 
mental production. The child’s score can be quantitatively and qualitatively 
analyzed. 

Other tests—Based on an analysis of 650 records of children, Fillmore 
(104) devised a scale composed of 49 items which will be applicable to 
infants aged four months to twenty-four months. Although the internal con- 
sistency of this scale is reported as satisfactory, “it is not claimed that the 
growth pattern measured is equivalent to what is subsequently termed men- 
tal age.” The score as a whole correlates with later intelligence quotients 
.03 to .48. The New Revised Stanford-Binet Tests of Intelligence (202), 
the International Performance Scale (156), and the Haut Rational Learn- 
ing Board (173) cover the preschool age range. The California Scale of 
Motor Development, devised by Bayley (72), is applicable to children 
under thirty-seven months of age. Kasambi (144) described a series of six 
problem-solving tests which are adapted to Indian children above nine 
months of age. 

Applications of tests—There have been numerous reports of the applica- 
tions of the Biihler tests to different groups of children. Hubbard (131) 
tested and retested 78 infants at intervals averaging four months, by means 
of the Biihler test, and established alternate-item reliability coefficients 
above .98, a retest reliability of .70 for the first and second tests, and .94 
for the second and third tests. Correlations with ratings on Merrill-Palmer 
tests administered at later ages were higher than those reported for any 
other infant and preschool scales. Herring (124) also checked the reliability 
of the tests on 114 different subjects, between the ages of one and fifteen 
months, and concluded that although the tests seem to indicate a fair degree 
of reliability over a limited time interval, there was little consistency in 
scores over a period of several months. Both Hubbard and Herring con- 
cluded that the Biihler tests may be too easy for American babies, and since 
the administration of the tests is not well standardized, it would seem that 
they are more valuable as clinical than as research instruments. Other re- 
ports of the results of the Biihler tests are by Wolf (221) on Viennese chil- 
dren; Iwai and others (136), and Hofstatter (128) on Japanese children, 
and Reichenberg (184) on a study of cultural influence on the develop- 
ment of the child. Curti (88) reported that 76 Negro children of Jamaica, 
aged from one to three, who were tested on the Gesell Developmental Sched- 
ule were below the norms. For these children the validity of certain test items 
is questioned because their performance was irregular and inconsistent. 
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Doscher (100) completed correlations between the Kuhlman-Binet tests 
and the Randall’s Island Performance Series. Results indicated that in grad- 
ing those children who are not fitted for verbal tests, the Randall’s Island 
series is satisfactory and can be used for comparing the verbal and motor 
aspects of intelligence. 


Rating Scales for Young Children 


Specific functions—W illiams and others (219) studied the language de- 
velopment of 285 children, between the ages of one and eighty months, and 
devised a tentative scale of language achievement covering these ages. A 
revision of Smith’s vocabulary test for preschool children was presented 
(219: 33-46, 79-87). The importance of environmental factors in deter- 
mining both the extent and the maturity of children’s vocabulary was 
stressed. Based on a representative sampling of 100 children between the 
ages of five and one-half and six years is Hildreth’s study (127) of their 
readiness for initial instruction as “typically organized in primary grades.” 
She constructed a readiness test for pupil aptitude for primary learning, 
and concluded that the best and poorest learners can be predicted. Seltzer 
(191), using the Thurstone technic of scaling attitude scales, devised two 
scales for nursery school children, one containing 42 statements, which tests 
singing ability; and the other composed of 44 items, which tests rhythmic 
ability. A color-form test for young children based on the Dearborn color- 
form test was constructed by Forbes (106). 

Other scales—Key and others (148) constructed a scale for grading the 
dressing skills of nursery school children. Experiments with 25 boys and 
20 girls, aged nineteen to sixty-four months, showed that the child’s age or 
maturation is of more importance than training. Fales (102) constructed 
a reliable rating scale by which the degree of vigorousness of preschool 
children’s play activities can be measured. Experiments using this scale dis- 
close a striking similarity between the vigorousness of the activities of boys 
and girls (103). 

Social behavior—Van Alstyne (211) devised a scale for rating social 
behavior and attitudes from the nursery school through the sixth grade. It 
consisted of thirteen situations and their response levels. Doll (99) evolved 
a Genetic Scale for Social Maturity which consisted of 117 items. The age 
range is from infancy through adult life. Ten familiar nursery school situa- 
tions were reproduced and scaled by Joel (140) to rate behavior maturity 
in nursery school children. Studies by Washburn and Hilgard (212) are in 
progress by means of which it is hoped further to objectify observations of 
social behavior of children between the ages of fifteen and fifty months. 
Bowley (79) developed a scale for rating the sociability of young children 
which combines qualitative and quantitative methods. Williams (220) made 
a factor analysis of Berne’s Social Behavior Patterns in Young Children, 
and concluded that this type of analysis might be used to increase the effec- 
tiveness of the scale and to eliminate from it certain ambiguous items. 
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Psychological Measurements of the Newborn Infant 


Investigators in the field of infant psychology and physiology have been 
active to ascertain what behavior is possible at birth, the changes that take 
place in the growth process, and how they are brought about. Responses 
during the first ten days of life have been most thoroughly studied because 
the infants are available for observation in hospitals and are subject to 
partial control. The problem is not to determine the infant’s characteristic 
mode of reaction in his protected environment, but to elicit responses or 
fluctuations in responses which he is capable of making to experimentally 
controlled stimuli of varying intensity, frequency, and duration. Many of 
the studies are preliminary and inconclusive. 

Reaction to color and light—Color vision in newborn infants has been 
investigated by J. M. Smith (195, 196) and by Peiper (179). Different 
methods were used by each experimenter and present results appear to be 
contradictory. J. M. Smith (195) studied the influence of visual stimula- 
tion on crying. Weiss (214) found that an increase in the intensity of light 
resulted in a decrease in bodily activity. This result has been substantiated 
by Irwin and Weiss (132, 134) and supplemented by Richard’s studies 
(186, 188) of the relationships of bodily and gastric activity in the neonate. 
Redfield (183) studied dark adaptation in 47 infants by lengthening the 
dark periods and holding light intensity constant. She reported that bodily 
activity decreased following lengthened periods of dark. 

Reaction to sound—Weiss (214) found that sound variations of different 
intensity also produced decreased activity. Visual and auditory stimuli 
which produced decrease in activity when presented independently, pro- 
duced markedly greater effect when presented together. Stubbs (199) ob- 
served the effect of duration, intensity, and pitch of sound on the responses 
of 75 infants under ten days of age. Stubbs and Irwin (200) applied 50 
stimulations of a tone of 580 cycles with a duration of .07 seconds to 6 
infants. Reaction time of the body was measured by the stabilimeter, and 
respiration by a pneumograph attached to the infants’ abdomens. With a 
loud tone the startle response occurred 70 percent of the time with an aver- 
age reaction time of .19 seconds, whereas the respiratory response occurred 
100 percent of the time with an average reaction time of .09 seconds. The 
respiratory reaction is the less variable measure. 

Reaction to various stimuli—Sherman and others (192) studied 317 in- 
fants, from a few hours to sixteen days of age, using stimuli of precisely 
measured intensity, rate, and duration. Reactions of the eye, response to 
pain stimuli, movements of the limbs, response to pressure, defense reac- 
tions, grasping, and miscellaneous responses were quantitatively recorded 
and analyzed. It was found that behavior patterns were inconsistent. Daniels 
and Maudry (89), Dockeray and Rice (98) and Delman (92), studied the 
responses of infants to tactual stimulation. Although there was some pat- 
terning, the results showed that there was a tendency toward mass action 
rather than specificity of response. Taylor (201) applied “rage” and “fear” 
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stimuli as described by Watson to 40 infants, aged from one to twelve days. 
No constant pattern of response was evoked by the different stimuli. Irwin 
and Weiss (133) found that 94 percent of the newborn infants studied 
showed less activity and crying when clothed than when unclothed. Crudden 
(87) investigated the threshold sensitivity in 9 sleeping infants between 
two and forty-four days of age. 

Palmar and plantar responses—Wenger and Irwin (217), measuring 
the amount of palmar and plantar skin resistance to electrical currents of 
fifteen newborn infants and six adults, concluded that increase in resist- 
ance is not a criterion of sleep, but that these increases in resistance are 
related to muscular relaxation. Pratt (182) made an extensive study of the 
generalization and specificity of the plantar response in newborn infants. 
Richards and Irwin (187) concluded from an examination of the experi- 
mental literature regarding the plantar responses, that there is much dis- 
agreement in the reported findings. Utilizing improved technics on 264 
subjects under sixty-six months of age, they carried out experiments to check 
certain discrepancies. They found that no typical response was revealed, 
and concluded that the plantar responses are variable, less active with age, 
and become negligible at about the fifth year. 

Opisthotonoid reaction—Irwin (135) analyzed cinema records of the 
opistholomus or backward curving of the vertebral axis, made of a group of 
normal infants during the first two years of life, for transformation of this 
behavior pattern in relation to age. The possibility of using this reaction as 
an index of development is suggested. 

Patellar reflex—Hazard (123) obtained patellar reflex time measures 
from 399 subjects ranging in age from birth to twelve years. “Reflex time 
increased gradually with age within the age limits of the study. . . . Con- 
ducting rate increased rapidly with age from birth to six years; more grad- 
ually from the sixth to the twelfth.” 

Conditioned reflex response—On the basis of a study of three infants 
eleven days to one month of age at the beginning of the experimentation, 
Kasatkin and Levikova (145) concluded that, in response to auditory 
stimuli, conditioned alimentary reflexes appear at the forty-fifth day of life. 
The main role is played by age and not by the number of stimulations. 
Considerable individual variation is apparent in the simplest auditory dif- 
ferentiation. Auditory differentiation already formed, may disappear under 
the influence of outer or inner factors. Wenger (218), using improved tech- 
nics, investigated the conditionability of the neonate and concluded that 
although some forms of conditioning were possible in some infants as early 
as the fifth day after birth, retention is low, the responses are unstable and 
not easily obtained. 


Motor Development 


The emergence and growth of specific motor patterns of the same individ- 
uals at successive ages have been studied by direct observation, supple- 
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mented by the analysis of cinema records. Gesell and others (114) pre- 
sented in normative gradations the sequential response of infants to specific 
situations from the earliest manifestations to the established patterns. 
Halverson (119) investigated the complications of the early grasping in 
infants aged four to twenty weeks. Early grasping was described as a two- 
phase activity of closing and tightening the fingers, which is not confined 
to the hand but is part of a total dynamic closure pattern. Gesell and Hal- 
verson (109) investigated the emergence and development of motor co- 
ordinations involved in thumb opposition in infants at successive ages from 
four to fifty-six weeks. Gesell and Ilg (111) presented the sequential de- 
velopment of the behavior manifestations and mechanisms of feeding, based 
upon consecutive observations of infants. Existing information on the de- 
velopment of upright posture has been supplemented by Thompson’s care- 
ful and detailed study (203). Cinematography has proved to be especially 
useful in analyzing motor development and it has been used extensively for 
this purpose. Numerous films are now available for scientific study, a few 
of which are enumerated (108, 110, 112, 113, 115, 135, 160, 206, 207, 208, 
209, 210). 

The effect of exercise—Dennis (94), in his consecutive observation of 
fraternal twins, from birth to the fifteenth month, reared under restriction 
on reaching, sitting, and standing, concluded that retardation as compared 
with existing norms was the result of the restrictions, since with subsequent 
exercise normal standards were achieved. The effect of exercise upon certain 
motor performances was investigated by McGraw (161) by the method of 
co-twin control. Daily exercise was given the experimental twin from birth 
to twenty-two months of age. The control twin was restricted in motor 
activities until his twenty-second month after which he received for a period 
of three months the same training as the experimental twin. Achievement and 
persistence of the experimental twin were markedly accelerated. 

Locomotion—In contrast to Herrick’s theory, Levy and Tulchin (152) 
concluded from their study of “all four” locomotion that this pattern occurs 
much more frequently than is commonly supposed, and when it appears, 
is a part of an orderly process in locomotor progression between the creep- 
ing and standing stages. 

Racial comparisons—In comparing the motor abilities of Negro and 
white children between two and five years of age, Rhodes (185) reported 
that there is striking similarity between the rate of motor development and 
the organization of motor abilities in the two races. Harmon’s study (120) 
of 133 children—Italian, Mexican, Negro, Jewish, and Indian—revealed 
marked differences in reaction time between the groups. The Italian children 
showed a more mature type of reaction age for age, and the Indians were 
found to be slower than the other groups tested. 

Handedness—Dennis (96), in a study of non-identical twins that were 
not conditioned by right-handed or left-handed presentation, found that 
lateral preference developed in dissimilar ways. Roos’ study (189) of 
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486 cases failed to reveal, within the age range of the experiment, any 
causal relationships between handedness and the dominant position of the 
fetus, the birth position, or weight of the child, or to the basal metabolism 
rate of the pregnant mother. Kraskin (151) concluded from a study of 
handedness in four age groups, from infancy to early adulthood, that 
handedness is a specific individual trait which is not related to intelligence, 
prenatal environment, or age. 

Mental-motor correlations—Mental-motor correlations have been studied 
by Bayley (74). Comparing the mental test scores and the motor test 
scores of 61 infants tested at monthly intervals from birth to three years 
of age, she concluded that motor growth is more rapid than mental 
growth for the infants studied up to the age of twenty-one months. 
Functional independence of intellectual ability and motor ability in- 
crease gradually as maturity proceeds and therefore are more closely 
related in infancy than in adulthood. Growth curves show that both mental 
and motor growth proceed more rapidly in the early months and are de- 
celerated later. From a report based upon mental and anthropometric 
measures of 125 girls and 127 boys, examined at successive ages between 
twenty-one months and seven years, Honzik and Jones (130) concluded 
physical and mental superiority are to a slight degree associated. 


Mental Development 


Sequential stages in mental growth are expressed by the infant in his 
adaptive behavior throughout the situations which Gesell and others (114) 
summarized. Blatz and others (75) reported the results of tests administered 
to the Dionne quintuplets at bimonthly intervals from the eleventh to the 
thirty-fifth month of life. Gesell’s developmental schedule was used through- 
out, supplemented by the Kuhlman-Binet and the Merrill-Palmer scales. 
The results are quantitatively and qualitatively analyzed and are presented 
in a series of graphs. The mental development of the five children is below 
the norms and is not identical. 

Cultural influences on mental test ratings—Wellman (215) studied the 
effect of preschool attendance on mental test ratings, and showed that 
children who had attended the university preschool made higher scores on 
American Council on Education Tests and college examinations than did 
a comparable group without nursery school experience. Coffey and Well- 
man (84) studied the changes in intelligence quotients of more than 400 
children in relation to fathers’ occupations and the education of parents. 
Skeels (194) studied the mental development of 73 children in foster homes 
and found that the mental level of these children was higher than would be 
expected from the educational, socio-economic, and occupational levels 
of their true parents. Studies of mental development as related to institu- 
tional environment have been made by Crissey (85, 86) and Durfee and 
Wolf (101). Bayley and Jones (73), in a cumulative study from infancy 
to six years, investigated environmental correlates of mental and motor 
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development. The composite rating for socio-economic factors showed slight 
or negative correlation with mental test scores to eighteen months. Co- 
efficients increased steadily thereafter. The relation of the different socio- 
economic factors to mental test scores and to age was presented. 

Sequential studies of specific functions—M. M. Lewis (154) used the 
biographical method in his study of infant speech recorded in phonetic 
script from the earliest utterances. M. E. Smith (198) studied 220 children 
between the ages of eighteen and seventy-two months in regard to some of 
the factors which influence the development of the sentence. Hildreth (126) 
traced the developmental sequence in name-writing of children three to 
six and one-half years of age with a median I. Q. of 120. Block-building in 
young children was studied by Guanella (117). The stages observed were as 
follows: non-structural; linear (at about ninety-seven weeks) ; areal (at 
one hundred and fifty-three weeks) ; and tridimensional (at about one hun- 
dred and ninety weeks). 

Incidence of specific functions in relation to age—Nelson (173) analyzed 
the performance records of 67 preschool children, from twenty-six to sixty- 
four months of age, and ranging in I. Q. from 96 to 168. In dealing with 
a simple problem in rational learning the children, while requiring more 
time and making more errors, follow adult patterns. Children as young as 
three years show the ability to eliminate logical errors. Somewhat in con- 
trast to this study is Maier’s study (162) reasoning in 39 children 
from forty-three to ninety-five months of age, which indicated that reason- 
ing ability, as inferred from performance on the experimental problem, is 
relatively late in maturing, rarely developing to any marked degree below 
six years of age. The ability matures at widely different ages and its ap- 
pearance is related to mental age. Mallay (164) investigated the latent 
memory span of nursery school children. Gutteridge (118) observed 417 
children aged two to five years in play and reported the average attention 
span in relation to age. Thrum (205) investigated the development of con- 
cepts of magnitude in children aged twenty-four to fifty-four months. 
Markey (167) investigated the imaginary behavior of 54 preschool children 
from records obtained during their free play, and in experimental situations. 

Davis (91) studied 436 children from five and one-half to nine and one- 
half years of age as to the use of proper names. M. E. Smith (197) studied 
the speech of eight bi-lingual children and concluded that the rate of error 
is higher than that reported for two- and three-year-old mono-linguals. 
Change to bi-lingual environment is more difficult for an infant under 
eighteen months of age than for those who are more advanced in age. 
Olson and Koetzle (175) devised a reliable method for recording the 
amount and rate of talking in young children between the ages of forty-six 
and sixty-eight months. Quantity of talking in young children is not an 
index of mental capacity. Quantity and rate yield a correlation of .13. 
Individual differences are pronounced, and boys tend to speak less but at 
a faster rate than do girls during a given period of time. Davidson (90) in- 
vestigated the letters which are confused and the extent of this confusion 
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among kindergarten and first-grade children. She found that errors, with 
one exception, fall into reversals and inversions. There is a marked de- 
crease in the percent of errors with increase in mental age, and children 
pass through certain stages of development before they are able to dis- 
tinguish b, d, q, and p. A larger percent of boys than girls make the errors 
studied, which is important since boys present a larger number of reading 
difficulties. 

Using the technic of Peck and Walling (177), Peck and Hodges (178) 
investigated the eidetic imagery of white, Mexican, and Negro children of 
preschool age. The results indicate that the Negroes possess a higher degree 
of eidetic ability than do the white and Indian children. Recent experi- 
ments on color and picture preferences among young children include a 
study by Hildreth (125) which established orange as the favorite color, 
with pink second, and red third, for a group of 78 boys and 60 girls, 
aged three to six years. Olney and Cushing (174) recorded the time each 
of 56 nursery school children spent in looking at various types of pictures, 
and found mechanical subjects involving people the most popular, and small 
animals, silhouettes, and animal activities the least popular. Jersild and 
Bienstock (138) found from observation and detailed analysis of cinematic 
records on 94 children and 17 adults that scores on accuracy in keeping 
time to music rose substantially from two to five years and that adults 
scored about twice as high as five-year-olds. Five subjects were given oppor- 
tunity to practice and their final average score was almost twice as high as 


the initial score. Much of the change during practice arose through co- 
operation and interest, as distinguished from improvement in ability. 


Personality Development 


Sequential studies of emotional behavior—The study by Blatz and Milli- 
champ (76) of emotional development in infancy was based upon five 
children who were observed by mothers in the homes for emotional episodes 
from birth to the second year of life. Types of behavior and the frequency 
of their occurrence in relation to age were presented. Based on the systematic 
observations of 127 children and supplemented by the findings of others, 
Bridges (80) discussed the emergence and differentiation of the primary 
drives and presented a schematic outline of their development. 

Incidence of emotional behavior—Dennis (95) concluded from his 
systematic observation of twins that smiling is a conditioned response 
which becomes attached to any stimulus which leads to relief. Blatz and 
others (77) concluded that laughter and probably smiling may be con- 
sidered as “socially acceptable tics” or compensatory motor mechanisms 
accompanying the resolution of conflict. Jersild and Holmes (137, 139) 
have used various approaches in the studies of children’s fears, including 
an experimental study of 105 nursery school children. The fears are 
analyzed and reported in relation to age, sex, intelligence, and socio- 
economic status. Holmes (129) followed this with an attempt to overcome 
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fears. Foster and Anderson (107) investigated unpleasant dreams in thir 
children aged sixteen months to twelve years. The types of dreams and their age 
frequency are reported in relation to age and other factors. Schramm (190) enc 
compared the responses of children to animals in various situations. S. J. ma 
Lewis (155) discussed the problem of thumb-sucking from its effect on dre 
physical structure, and Levy (153) considered it from the psychiatric int 
; angle. They concluded that it is not possible to generalize about this activity ha’ 
since it arises from highly specific and complex individual needs. An stu 
j analysis by Koch (149) of certain forms of “nervous habits” in nursery gel 
school children observed in 400 one-half minute periods during eight months of 
q suggested that many factors such as local irritations, the gross bodily | 
movements involved in the activity, boredom, internal conflict, restraint, a | 
a and degree of aggressiveness, contribute to mannerisms. Blatz and Ring- Gr 
: land (78) observed 71 children between the ages of two and seven years in scl 
ft half-hour periods at various times during the school day to discover the da 
:% frequency, nature, and origin of tics. They concluded that tics are ex- th 
; tremely common among children of these ages—and are more frequent (] 
at when gross bodily movements are inhibited. Tics involving the mouth are lat 

a most frequent. 

4 Culture and emotional patterns—Hattwick’s findings (122) on the ‘ 
ia interrelationships between certain factors in the home and the behavior pa 
1% of 335 children between the ages of twenty-three and sixty-eight months tend * 
: 3 to agree with former studies, notably Fitz-Simons (105), which showed h: 
af that children whose homes reflect overattentiveness are liable to display “ 

{ infantile, withdrawing types of reaction and that inadequate attention in ee 
i i the home and aggressive types of behavior in children are related. ie 
a: Emotional concomitants of behavior—Johnson (141) devised three b 
| # experimental situations in which eight nursery school children were moti- 

: } vated progressively to attain a recognized goal in the face of annoying or y 
fear-provoking interference. Results indicate marked variation, from 
hit complete withdrawal to persistence. Kendrew (146) studied the persistence 

of moods experimentally aroused in children from five to seven years of n 

7 age, and their effect on a natural rate of working. In nine out of twenty a 
, cases disappointment appeared to decrease the output in subsequent activity. ( 
ob In eight cases it increased the energy output. McClure’s study (159) of the I 
ta : effect of varying verbal instructions on the motor responses of 39 children ( 

: ranging in age from twenty-seven to seventy months showed that encourage- 1 

i ment was more effective than discouragement, and emphasis on success : 


ny more effective than emphasis on failure at the ages studied. Reward as 
‘7 motivating factor in child learning seemed to act more strongly than an 
unpleasant incentive in an experiment by Mast (168) with 43 children 
aged thirty to seventy months. Mayer (169) observed the resistant reactions 
of 277 children below six years of age to the New Stanford-Binet Scale 
and found that negativism is most frequent at thirty-six months. 

Incidence of social development—Two trained observers recorded the 
social behavior of the Dionne quintuplets from the twelfth week to the 
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thirty-sixth month of life (75). Mallay (165) in a study of 21 children 
aged twenty-four to fifty-seven months, concluded that learning by experi- 
ence is more important in establishing successful social contacts than is 
maturation. Portenier (181) concluded, from the observation of 25 chil- 
dren, aged two to five years, that social adequacy is largely a matter of 
integration and is the result of the total situation and the elements which 
have influenced the development of the individual child. Murphy’s excellent 
studies (171, 172) of sympathy in young children revealed, in addition to 
general trends related to age and intelligence, that specific characteristics 
of the situation determine behavior. 

Effect of culture on social development—Anderson (69, 70) extended 
a previous study which showed that the lowa University Nursery School 
Group is significantly more integrative and less dominative than a nursery 
school group and a non-nursery school group in an orphanage. Combined 
data for all groups indicated that girls are significantly more dominative 
than boys, and tl at boys are more integrative than girls. A study by Mallay 
(163) of 21 nursery school children, observed in the fall and six months 
later, showed an increase in social behavior. 

Effect of training on behavior—Keister and Updegraff (147) contributed 
a study on the reactions of 82 children to failure. They reported that after 
a period of training children tried longer, manifested more interest in solv- 
ing problems independently, and eliminated undesirable emotional be- 
havior. This improvement was not believed to be a function of age but 


was the result of a program of training. Page’s work (176) with groups of 
nursery school children indicated that experimental training to increase 
self-confidence was effective in increasing manifestations of ascendent 
behavior of children as young as three years. 


Method 


Washburn (213) devised a simultaneous observational and recording 
method with a list of 20 symbols to be used in recording the various 
activity patterns in young children during the free play period. Loomis 
(157) pointed to certain pitfalls in applying the objective sociological 
method to the study of children and presented the general aim for further 
development of the method. Arrington (71) discussed the progressive re- 
finement which has taken place in the time-sampling method in studies of 
the incidence and patterning of various kinds of behavior in young children. 
On the basis of various empirical tests, she reported that the usual measures 
of average tendency and variability are applicable to irregular and J-shaped 
distributions obtained, except in cases where the sampling is inadequate. 
Harris (121) suggested a method of recording overt attention quantitatively 
during a test situation. Goodenough (116) recalled the many sources of 
error in the past that have accompanied the naive and superficial use of the 
time-sampling technic. She suggested the experimental approach in the 
study of child behavior whenever the problem under consideration can be 
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adapted to the laboratory setup. Koch (150) reported the results of a 
multiple factor analysis of certain measures of activity in nursery school 
children. M. C. Jones and Burks (143) reviewed the field of factor analysis 
and critically considered its implications for the problems under considera. 
tion. Shock (193) devised an apparatus which provides continuous, simul- 
taneous photographic records of physiological processes which H. E. Jones 
(142) adapted and supplemented. The experiments reported appear to be 
satisfactory. Dennis (93) compiled a bibliography of some 64 baby biogra- 
phies of infants under three years of age. McCarthy (158) contributed a 
graphic age converting scale which covers the ages from birth to five 
years. Campbell and Breckenridge (82) studied the complex problem of 
cumulative record keeping for the individual child which will most com- 
pletely reveal the various growth processes and their interrelations through 
time. Thompson (204) discussed the inadequacy of the test construction 
criterion “increasing ability at successive ages” for properly assessing 
infant development. 


Summary 


During the period covered by this review there has been increasing 
precision in the methods used for investigating the newborn infant, result- 
ing in more objective, quantitative, standardized studies. Infant norms are 
more finely graded and are presented in intervals of weeks or lunar 
months. Several scales for children at the grade-school level and above 
have been extended to include the preschool level. Emphasis is placed upon 
the quantitative and qualitative analysis of the separate items of the scale 
as well as on the total score in developmental diagnosis. In studies of pre- 
school children the various factors are usually reported in relation to the 
mental age and intelligence quotient. Many of the subjects of these studies 
come from highly homogeneous groups with respect to intelligence, usually 
well above the level of the general population, which probably tends to 
lower the correlation reported between mental status and other factors. 
Rating scales and tests of specific abilities are numerous. More interest is 
evidenced in social and emotional behavior. Their incidence is reported 
in relation to age. Highly significant data are being obtained from cumu- 
lative growth studies of the same individuals studied intensively in inter- 
action with environment. 





CHAPTER IV 


Applications of Intelligence Testing! 


NOEL KEYS 


Deserve LACK OF AGREEMENT as to the nature of intelligence, and un- 
certainty as to the precise significance of so-called “intelligence tests,” the 
indispensability of these instruments is attested by the continual recourse 
to them. The literature on applications is increasing in what seems almost 
geometrical progression. The bibliography for this chapter has been 
chosen from more than five hundred titles considered by the writer. Major 
considerations in the selection have been the importance of the problem 
treated, the objectivity of the data, the adequacy of sampling, the apparent 
statistical justification for the findings, and, to some extent, the novelty of 
the conclusions reached. Preference has also been given to publications 
most widely available to American research workers. 


1. Application to the Study of Individual Differences 
Human Variability 


Analyzing the range of capacities in a wide variety of functions Wechsler 
(394) discovered that in only 5 of 72 comparisons did the individual rank- 
ing 999th in a thousand exceed the ability of the individual ranking second 
by more than 2.7 times—or the Napierian base, e. From this he ventured 
the hypothesis that there are biological limits to human variability which 
generally serve to restrict it to a ratio of less than three to one. 

The irregularity of ability profiles among morons was studied by Durling 
(272); among university freshmen, by Chen (254); and among ten-year- 
olds selected as having 1.Q.’s between 90 and 110, by Stout (385). In em- 
phasizing the extreme unevenness of abilities in a given individual these 
writers took no account of possible errors of measurement. 


Problems of Nature and Nurture 


Several major contributions during the last three years would seem to 
have increased the weight assignable to environmental factors as deter- 
minators of mental differences. 

Twins—The Chicago twin study of Newman and others (346) saw full 
publication in 1937. Their intensive comparisons of 50 pairs of fraternal 
twins reared together, 50 pairs of identical twins reared together, and 19 
pairs of identicals reared apart, pointed to the conclusion that the ratio of 
the contributions of nurture to those of nature fluctuates greatly, depending 





1 Bibliography for this chapter begins on page 327. 
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apparently on whether heredity or environment varies more widely in the 
situation considered. Richardson (359) inferred that the similarity in 
environment of twins is sufficient to increase their innate resemblance from 
an r of .50 to one of 81. 

In a novel approach especially remarkable for the number of cases as- 
sembled, Rosanoff and associates (362) analyzed the comparative incidence 
of mental deficiency other than mongolism in 126 pairs of monozygotic and 
240 pairs of dizygotic twins having one or both members of each pair 
affected. They concluded that hereditary and germinal factors, while impor- 
tant, are by no means essential to mental deficiency, and that the evidence 
points to cerebral birth trauma as an etiological factor of utmost impor- 
tance. Byrns and Healy (244) discovered an inferiority of 10 percentile 
points in average intelligence scores among twins as compared with other 
high-school pupils. 

Foster children—A study by Leahy (331) of 194 children adopted under 
six months of age is noteworthy for the exceptional care with which factors 
operating to induce selection in placement appear to have been excluded, 
and the foster and true parent groups matched as to occupational status, 
urban residence, and the like. ‘Under these conditions, the I.Q.’s of the 
adopted children showed a corrected correlation of only .21 + .06 with 
intelligence of the foster parents, as contrasted with a resemblance of 
.60 + .03 between a matched group of true parents and their children. 
Schott (368) reported no significant gain in the I.Q.’s of 200 Michigan 
children following placement in foster homes. Skeels (375), however, 
found 73 adopted children to show a median Stanford 1.Q. of 114 and a 
Kuhlmann-Binet of 116, though born of parents of educational and occu- 
pational status markedly below average. These quotients, however, were 
obtained at an average age of only twenty-four months, and ran ten points 
lower for those children over thirty-six months. 

Parental occupation and home environment—F or 1,000 ten-year-olds in 
Czechoslovakia, mean Binet I.Q.’s ranged from 90 for children of day 
laborers to 117 for those of university educated parents (257:803-807). 
Differences between social groups under American conditions, however. 
appeared somewhat less extreme (234, 247, 293). A study by Byrns and 
Henmon (246) of 100,820 high-school seniors showed children from the 
highest occupational classes deviating more widely from the average than 
those from the lowest groups. H. S. Hill (305) and Arsenian (227) found 
bi-lingualism reflected scarcely at all in the 1.Q.’s of American children of 
Jewish and Italian families, considered separately. Children four to nine 
years old provided with improved housing by a British slum clearance 
scheme showed small but significant gains over a control group, according 
to Dawson (264). 

Educational and social environment—There is accumulating evidence 
that the influence of schooling upon intelligence may be less negligible than 
studies of a decade ago would make it appear. Wellman (395) reported 
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students having six or more years of attendance in lowa University demon- 
stration schools as consistently surpassing on intelligence tests at college 
entrance those of equal initial 1.Q. but shorter attendance in these schools. 
Crissey (261) found children in institutions for the feeble-minded showing 
an average decline of 4.9 points in I.Q., as against a gain of 2.3 for a like 
period by orphanage residents of the same initial I.Q. One wonders whether 
this result might not reflect subtle selective factors which influenced the 
original commitments. 

The effect of improvement in reading upon measured intelligence con- 
tinues in dispute. Scruggs (369) noted a gain in the intelligence of Negro 
children under remedial instruction, whereas Hawthorne (300) obtained 
a correlation of —.17 + .07 between changes in 1.Q. and in reading scores 
among 104 fifth- to twelfth-grade pupils selected for special instruction 
because of reading disability. Lazar (330) has also made a study of the 
reading interests of children. Klineberg (323) reported further evidence 
of the relation of intelligence in Negro children to length of stay in city, 
and Pieter (350) obtained a correlation of .80 between social environment 
and the I.Q. of children in Poland. 

Implications for eugenics—In certain Devonshire districts with much 
intermarriage, 54 percent of the children tested were mentally deficient, 
according to Rau (355). The high negative relation between fertility and 
occupational classes in Great Britain was analyzed by Bradford (233) and 
Cattell (249). The latter concluded that though the national intelligence 
is declining but 0.1 1.Q. point per annum, the effect on the extremes of the 
distribution is pronounced. In one generation the percent of city residents 
with 1.Q’s above 120 is likely to be diminished by 35 percent. From Amer- 
ican data, however, Shuttleworth (373) reasoned that more than half of 
the detrimental effect of present differential birth-rates can be attributed 
to the poor environment of children of the lower classes. His contention 
adds point to new evidence of the gross inequality of opportunity in Eng- 
lish education (296). More than 50 percent of children with Otis 1.Q.’s 
of 130 or above were found to be without opportunity for higher education. 


Race and Sex Differences 


Race and national groups—Porteus (353) found that Australian aborig- 
ines outscore Kalahari Bushmen on various tests of learning capacity. 
This he regarded as in keeping with the extent to which each group has 
succeeded in mastering its environment. On the American Council on Edu- 
cation Psychological Examinations at the University of Hawaii, Chinese 
students surpassed the American, Japanese, and part-Hawaiian groups on 
tests of artificial language and arithmetic. The Americans led on all other 
comparisons, with the part-Hawaiians usually ranking lowest, according 
to Livesay (332, 333). Byrns (243) compared the intelligence of Wis- 
consin school children of thirty different nationality groups, and Garth and 
others (286, 287) reported further data on Mexican children in the South- 
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west. Their inferiority was shown to be greatest on verbal measures of 
intelligence and school achievement, while on the Pintner Non-Language 
Test, they virtually equalled the white controls. From a survey of the lit- 
erature comparing intelligence scores of Negroes and whites, Witty and 
Jenkins (400) concluded that the differences observed are probably attrib- 
utable to factors other than heredity. 

Sex—Little of importance has been added to our knowledge of sex dif- 
ferences in intelligence. Kirihara (322) found Japanese males excelling 
females on four types of tests, except at certain ages below twelve. Terman’s 
contention that gifted boys outnumber girls at high-school level by two to 
one, however, was challenged by Witty (401), who discovered 0.32 percent 
of boys as compared with 0.35 percent of girls testing above 140 1.Q. Among 
Negroes in Grades III to VIII, girls with Stanford 1.Q.’s over 120 proved 
more than twice as numerous as boys (315). 


2. Studies of the Relation of Intelligence to Other Traits 
Physical Traits 


Growth and development—Repeated measurements on 252 children ex- 
amined at intervals from twenty-one to eighty-four months of age were 
analyzed by Honzik and Jones (310). These disclosed a correlation between 
height and intelligence of .19 at seventh birthday. Both height and weight 
showed a residual relationship with mental superiority, even with socio- 
economic index constant. The tendency toward subnormality among victims 
of puberty praecox is well attested (321); but in a normal population, a 
small though insignificant superiority of 2.25 1.Q. of postmenarcheal over 
premenarcheal girls of like age was ascertained by Stone and Barker 
(384). : 

Appearance—Hollingworth (308) obtained photographs under uniform 
conditions of forty fourteen- and fifteen-year-olds with I.Q.’s of 135 to 190, 
and a control group with I.Q.’s of 90 to 110. When these were rated for 
beauty, the highly intelligent subjects averaged consistently higher by sev- 
eral methods of comparison, with differences amounting to about 1 A.D. of 
the distribution. 

Health factors—Brander found lower mentality associated with rickets 
(236), enlarged tonsils (237), and low birth weight (235) among 376 
school children of premature birth. Children receiving treatment for al- 
lergies, chiefly asthmatic, proved fully equal in intelligence to non-allergic 
controls, and little if any more retarded in school (389). The experimental 
literature showed little evidence of any marked influence of diet upon intel- 
ligence scores, according to Fritz (284), though achievement may be im- 
paired thereby. Physical ailments were found to be accompanied by reduced 
accomplishment ratios on the part of high-school pupils, and particularly 
those of over 115 I.Q. (382). Hinton (306) studied the basal metabolism 
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of 90 five- to fifteen-year-olds free from physical disturbances other than 
subactive thyroids. For this group metabolic rate correlated no less than 
.74 with Stanford-Binet, and .66 with Arthur Performance Scale quo- 
tients. A follow-up by Baller (229) of special-class children in Nebraska 
averaging 60 1.Q. showed a death-rate seven times that of a control group 
with 1.Q.’s from 100 to 120. 

Athletic and motor efficiency—Athletic achievement continues to reflect 
positive but low correlation with intelligence. This appears traceable more 
to lack of success by the dull than to superior performance of the brightest 
students (318). Among boys in special classes for the mentally deficient, 
correlations of physical ability tests with intelligence ranged from low to 
marked, depending on the complexity of the feat (370). 


Social Traits 


Personality and interests—Thirty-five out of 78 English children with 
1.Q.’s of 140 to 180 brought to the Psychological Centre were “difficult” in 
some respect, and “not less than ten” were definitely unhappy in school, 
according to Nevill (345). Hollingworth (309) held that children with 
1.Q.’s from 125 to 155 have best prospects of developing successful and 
well-rounded personalities. The intelligence of college students was found 
unrelated to introversion or emotional stability (407), to frequency of 


anger (339), and to attitudes toward war (280). 

G. B. Smith’s investigation (377) of the high-school and college activities 
of 512 University of Minnesota students from 1925 to 1929 showed that 
publications and dramatics had attracted the more intelligent of both sexes 
at both school levels. Athletics and musical and social activities drew a 
less capable group in high school, and religious organizations in the 
university. 

Character and conduct—In a work of uncommon scope and thorough- 
ness, Chassell (253) synthesized the evidence from nearly three hundred 
studies in psychology, sociology, and criminology bearing on the relation 
between morality and intellect. On the basis of these, supplemented by two 
original investigations, she concluded that, in groups of restricted type and 
range, intelligence and conduct will show correlations of only .10 to .40, 
with the true agreement falling somewhere below .50. In the population at 
large, however, the correspondence would undoubtedly be greater, though 
scarcely as high as .70. Thorndike (390) also published a reanalysis of the 
relationship between morality and intellect in members of the royal fam- 
ilies of Europe, as studied by Woods three decades ago. After allowing for 
the estimated influence of “halo” and the counteracting effect of inadequacy 
of data, he decided that the true correlation is not far from .60. (Intelli- 
gence in delinquent groups is treated later in this chapter, under the head 
of “Clinical Applications.” ) 
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Particular Talents and Abilities 


Music and Art—Farnsworth (279) discovered no significant differences 
between tests of intelligence and of pitch and tonal memory as predictions 
of music grades of college students; but a controlled experiment with 
ninth-grade pupils reported by Lamp and Keys (326) indicated Terman 
Group I.Q.’s inferior to both pitch and tonal memory in forecasting per- 
formance on brass, woodwind, and stringed instruments. Two studies (391. 
408) confirmed previous observations as to the low positive relationship 
between artistic talent and intelligence. 

Miscellaneous—Davis (263) found a slight correlation between length 
of spoken sentences and the I.Q.’s of primary school children. Pupils with 
1.Q.’s over 120 showed less forgetting of difficult school subjectmatter over 
summer vacation than did those below 90, according to Kolberg (324). 
Ray (356) obtained a correlation of .89 between mental ages of twelve- 
year-olds and their success on 21 tests of ability to generalize. 


3. Intelligence Testing for School Purposes 


Surveys—Oakley (347) investigated the prevalence of psychological 
testing in the secondary schools of England, and Parsons and Moderow 
(348) the programs in 53 American cities. Eells (275) analyzed the returns 
from 198 schools participating in the Cooperative Study of Secondary 


School Standards; C. Woody (404), the showing of Michigan high-school 
sophomores on the American Council on Education Psychological Exam- 
ination; and Royer (364), the statewide survey of Oklahoma high schools 
and colleges by means of the Ohio State Psychological Test. Much informa- 
tion as to differences in intelligence levels according to sex, region, size, and 
type of institution, and curriculums pursued is available in these reports. 

Record systems—Hanley and others (298) developed a cumulative record 
system for English schools, and Embree (276) for American. The latter 
utilizes standard deviation scores to reduce test results and teachers’ marks 
to a comparable basis. 


Trend of Intelligence Scores 


In high school—Despite reduction in failures and decreasing selectivity 
of secondary schools, two independent investigators reported a noteworthy 
rise in intelligence scores over a period of years. Roessel (360) discovered 
in three Minnesota towns a gain in mean I.Q.’s on the Miller Mental Ability 
Test in 1934 as compared with 1920, for each school grade from the seventh 
to the twelfth. For high-school seniors, the rise was from 118 to 122. Rund- 
quist (365) reported an even more startling superiority of Minneapolis 
seniors in 1933 over those of 1929. Various possible explanations suggest 
themselves, not least of which would be the greater competency of exam- 
iners and increasing “test-wiseness” of pupils, as reflected in the findings 
concerning effects of practice on test scores (224). Six California high 
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schools showed a mean I.Q. of 104 in 1933, as compared with 108 in 
1917 (230). 

In college—The steady rise in intelligence of freshmen entering the arts 
college of the University of Minnesota from 1926 to 1934 is attributed by 
Williamson (397) to more intelligent guidance in secondary schools. Col- 
lege aptitude scores of Stanford University students showed a similar up- 
ward trend from 1923 to 1931, since when abandonment of competitive 
admission has led to a distinct reversal, according to Cowdery (260). An 
equivalent form of the Ohio Examination administered to 483 college 
students in their senior year revealed an average rise of 1] percentile points 
over their scores as freshmen (299). Lyon (336) noted a correlation of 
44 between the mean intelligence in various Wisconsin high schools and 
colleges and the size of their respective graduating classes; and Cavan 
(251) observed a similar relationship with size of enrolment among col- 
leges using the A. C. E. test, as well as a generally superior level of ability 
in private as compared with public institutions. The intelligence of students 
in evening and extension classes in upstate New York was surveyed by 
McGrath and Froman (337), H. P. Smith (378), and Strabel (387). Except 
in classes organized under the federal work-relief program, these students 
equalled or excelled corresponding on-campus groups. 


Predicting Achievement in Elementary School 


Rush (366), pointing out that most failures occur in Grades | and II 
and are associated more closely with mental than with chronological age, 
argued that standards for admission should be revised to take account of 
the former. The futility of grade repetition as a remedy for too early 
entrance is shown by Arthur (228), who found that pupils repeating the 
first grade did no better, if as well, as beginners of the same mental age. 
Woods and others (403) suggested that reading might well be deferred 
until the child reaches the mental level of the average pupil in high first 
grade, and reported the successful operation of “transition groups” for 
those considered not ready for promotion into high first. An interesting 
experiment with four beginning groups by Gates (288), however, indicated 
that the mental age necessary for success in beginning reading is by no 
means a fixed quantity, but conditional upon the teacher, and the methods 
and materials used. 

Kyte (325) was impressed by the fact that 64 percent of 1,485 first- 
grade failures had I.Q.’s of 90 or above. This raises the question as to “why 
normal children fail to make normal progress.” Horn and Main (311) 
studied the grade placement of pupils between 90 and 110 1.Q. in Los 
Angeles schools and found these strictly normal children to average one- 
tenth of a grade lower than they belonged by prevailing age-grade norms. 
These norms, however, included grade repeaters. When standards were cal- 
culated on the assumption that children enter first grade within six months 
of the minimum age permitted by law and progress regularly thereafter, 
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the 90 to 100 1.Q. group were found retarded by an average of .53 grades. 
or just over one semester. Present grade standards seem too severe for the 
average child. 


Predicting Achievement in High School 


General success—Mitchell (341), and W. H. Woody and Cushman (405) 
produced new evidence as to the high rate of elimination among pupils of 
less than average intelligence. Douglass and Wind (265), however, dem- 
onstrated that elimination is even more closely related to retardation, and 
to economic and cultural factors in the home. Of 1,373 pupils who failed in 
two major subjects in different semesters, those who graduated were found 
no different in age, and only five points higher in 1.Q., than those who did 
not graduate (294). Stejskal (257: 783-94) obtained a correlation of .66 
between a two-hour mental test and achievement in Czechoslovakian sec- 
ondary schools, even though applicants scoring below the 40th percentile 
were discouraged from entering. Hughes (312) reported coefficients of 
similar magnitude between intelligence and achievement tests administered 
to competitors for secondary-school scholarships in England. 

Embree (277) discovered no significant differences in the value of I.Q.’s 
of the three levels, 90 to 109, 110 to 129, and 130 to 149 for the prediction 
of honor-point ratios in senior high school. A combination of 1.Q. and 
honor-point ratio in ninth grade was found to forecast average marks in 
tenth to twelfth grade, exclusive of music and physical education, to the 
extent r = .89. 

Particular subjects—Several studies treat of the prognosis of achieve- 
ment in specific subjects. Grinnell (297) observed Inglis Vocabulary scores 
to correlate .70 with average 1.Q. on four tests, as against-only .53 with 
an average of English marks over six semesters. He did not, however, 
correct for the higher reliability of 1.Q.’s as compared with marks. For 
predicting achievement in United States history, Bolton (232) found the 
Wesley Test of social studies vocabulary superior to the Otis Group Intel- 
ligence (r = .65 versus .59). Hummer (313) reported Otis Group 1.Q.’s 
below 100 to 110 indicative of failure in tenth-grade geometry, but the 
Otis Self-Administering Test gave an indifferent prognosis of physics 
achievement (376). From the literature concerning high-school mathe- 
matics, Douglass and Michaelson (267) rated both I.Q. and M.A. inferior 
to either special aptitude tests or average marks in previous year for pre- 
dicting high-school mathematics. Dunn (271) cited evidence as to the 
closer agreement of algebra achievement with prognosis when classes taught 
by different teachers were considered separately. Earle (273) sought to 
arrange school subjects into unitary groups. He claimed that tests for 
vocabulary, algebra, geometry, and science will provide an adequate basis 
for educational counseling. 
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Predicting College Scholarship 


An admirable summary of the literature to 1934 on the prediction of col- 
lege success, by Segel (371), should be of service to administrators, coun- 
selors, and research workers. Its thirty-one tables present in convenient form 
the findings on a wide variety of intelligence, aptitude, and achievement 
tests, as well as high-school marks and other factors. 

The numerous studies which have appeared over the past three years add 
little that is novel. From the showing of women students on Cambridge 
final Tripos, Dale (262) reached the conclusion that “the selection of stu- 
dents best fitted to pursue highly specialized degree courses does not appear 
to be made easier or more reliable by the use of mental tests.” Root (361), 
Ficken (283), G. A. A. Jones and Laslett (317), and Drake and Henmon 
(269) are among those confirming previous observations as to the inferior- 
ity of intelligence and aptitude tests to percentile rank in high-school class 
for the prediction of college grades. Landry (328), however, found Coop- 
erative Test Service scores and the College Entrance Board test of verbal 
aptitude, while less satisfactory than an adjusted average of twelfth-grade 
marks, nevertheless definitely superior to College Entrance Board examina- 
tions. Payne and Perry (349) reported that less than one in a hundred of 
students gaining scholarship honors and awards at the City College of New 
York scored below average on the psychological examination, and Mc- 
Quitty (338) found that student mortality at the University of Kentucky 
was six times as high among those in the lowest 3 percent on the classifica- 
tion test as among those in the highest 3 percent. Others (316, 361) reported 
satisfactory results from a combination of high-school marks and tests of 
scholastic aptitude. Douglass and associates (266, 267) were unable to 
obtain a multiple correlation higher than .50 with college achievement in 
either mathematics or social studies, and but .57 with courses in history. 
Relationships in the case of mathematics were markedly curvilinear. 

Factors influencing correlations—The influence of university practices on 
size of coefficients is indicated by various observations. Since comprehen- 
sive examinations have been made the sole measure of scholarship at Chi- 
cago, Reitz (358) noted an increased agreement between grades and scores 
on the American Council on Education Psychological Examination, and a 
slightly lowered correspondence with high-school rank. At Minnesota, Wil- 
liamson (398) remarked a decline in the agreement between prediction and 
performance for successive classes from 1926 to 1935. This he attributed to 
the increase in average intelligence and reduced variability of classes along 
with other outcomes of improved counseling procedures. 

Long-range forecasting—Of especial interest is a study by Byrns and Hen- 
mon (245) of the long-range prediction to be had from performance on the 
National Intelligence Test in Grades III to VIII. Quotients thus obtained 
correlated .81 with psychological tests at college entrance, .43 with the four- 
year average of high-school marks, and .45 with first semester grades in the 
University of Wisconsin. In view of the degree of prediction obtained, it is 
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' noteworthy that the mean I.Q. of Madison students entering the University 


had been only 109.3 when in elementary school. 


Informing Pupils of Scores 


Despite a gradually growing public understanding of mental measure- 
ments, there is little agreement as to how best to utilize test results. Below 
college level the consensus of school authorities appears to be strongly 
against acquainting either pupils or their parents with records made on 
psychological examinations. So fearful are administrators of possible mis- 
use of test results, that in many schools these are hidden carefully in central 
files to which even the teaching staff is forbidden access. There is evident 
need of experimentation concerning the effects of knowledge of intelligence 
scores by teachers, parents, and the children themselves. A study by Snyder 
(379) constitutes one approach to the problem. In an Ohio high school, any 
pupil who cared to learn his percentile rank on a schoolwide intelligence 
test was handed this information on a slip of paper, while all received a 
careful explanation of its significance. Three-fourths of those above the 30th 
percentile and half of those below that point took advantage of the oppor- 
tunity. Questionnaire replies showed 72 percent in the lower three-tenths 
and 90 percent in the upper seven-tenths favored the reporting of scores in 
this fashion. Opinion was almost unanimous that high scores did not mean 
that one could succeed without working, or low scores that success was im- 
possible. Other reactions seemed wholesome, and during three years of 
reporting scores no parent had objected. 


Discrepancies between Promise and Performance 


Individuals whose achievement differs widely from measures of their 
scholastic aptitude continue to attract investigators. Wolf (402) adminis- 
tered a variety of tests to 50 girls of Italian parentage doing unsatisfactory 
work in sixth grade, and to 50 others of similar Binet I.Q. but greatly supe- 
rior achievement. So far was the failing group from indicating any special 
bent for non-verbal pursuits that the Detroit Mechanical Aptitude Test 
showed a significant difference in favor of the good students. Six measures 
of adjustment and interests all favored the successful, the difference being 
highly significant in the case of the Woodworth-Cady. Eckert and Mills 
(274) compared Buffalo high-school seniors whose Regents’ Examinations 
placed them 1.2 o higher on average than their intelligence scores, with a 
group whose Regents’ averages placed them 1.2 o lower. These groups dif- 
fered widely in percent of retardation in high school and in extent of partici- 
pation in club activities. Differences in socio-economic status were small: 
measures of studiousness, however, markedly favored the academically 
successful. Reeder (357), likewise, discovered scores on the Wrenn Study 
Habits Inventory to add slightly to the value of intelligence scores, and 
found bright Negroes doing failing work in junior high school to be charac- 
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terized by poor home conditions (269). Neel and Mathews (344) reported 
superior college students to be younger, happier, and participating in more 
campus activities but scoring more introverted and self-sufficient and less 
sociable on the Bernreuter Inventory than “non-achievers” of like superior 
intelligence. 

The significance of relationship between achievement and capacity as an 
indication of good teaching is emphasized by Corey (258), who found per- 
formance on a regularly scheduled examination correlating .52 + .05 with 
Army Alpha scores, as against .06 + .07 in the case of tests sprung without 
notice. Specific measures for bringing the accomplishment of abler college 
students more nearly up to their potentialities are discussed at length by 


Starrak (381) and Wrenn (406). 


Providing for Individual Differences 


Ability grouping—Part I of the Thirty-Fifth Yearbook of the National 
Society for the Study of Education is devoted to consideration of the theo- 
retical desirability and practical effectiveness of various types of pupil 
grouping. Chapter XV constitutes a review by Cornell (259) of the experi- 
mental evidence regarding sectioning for ability leading to the qualified 
conclusion that “when attitudes, methods and curricula are well adapted 

. results, both objective and subjective, seem to be favorable” to such 
grouping. (See the Review of Educational Research for October 1937.) 

Special classes—Heck (302) summarized research on special classes for 
the mentally handicapped and gifted. Gates and Bond (289) reported inter- 
estingly on the practices of the Speyer School, established under joint super- 
vision of Teachers College and the New York City Board of Education for 
experimentation with special classes for dull-normal and exceptionally 
bright pupils. Under a highly liberalized curriculum, the former showed 
large gains in reading ability and conspicuous improvement in behavior 
and attitudes toward schooling. 

Acceleration of the bright—Several recent studies have combined to place 
in more favorable light the practice of accelerating bright students which 
has been a target for much criticism on subjective grounds. Sackett (367) 
studied children given special promotions in fourth or fifth grades. Eight 
semesters later they were shown to have gained more in E.Q. and A.Q. than 
those who had progressed normally, and teachers rated sixteen as having 
benefited from the acceleration to one who had not. Herr (303) compared 
97 pupils selected to complete the work of seventh and eighth grades in one 
year with a control group matched for sex, I.Q., E.A., school citizenship rat- 
ings, and high-school curriculum. The accelerates slightly excelled in their 
high-school studies, while in social adjustment they “did not differ from 
their peers.” A similar group interrogated at time of graduating from high 
school voted twenty-four to four that acceleration had not deprived them 
“of any honors or social advantages later in high school” (372). Wilkins 
(396) showed 282 pupils graduating from high school under seventeen 
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years of age to be superior in school achievement, while their ratings as 
regards interests, activities, and age of companions preferred were inter- 
preted favorably. At Buffalo, Strabel (386) found boys and girls accele- 
rated in elementary school surpassing matched groups in all high-school 
activities save athletics, and showing to better advantage in scholarship, 
student activities, and fraternity memberships in college. 


4. Intelligence Testing in Occupational Studies and 
Vocational Guidance 


A survey of large industrial and commercial companies indicated that 
intelligence tests are not now and never have been widely used in the selec- 
tion of employees. This, Fryer (285) explained as due to practical consid- 
erations rather than any “disillusionment” as to the value of such meas- 
ures. Scores on two intelligence tests given the entire force of a public utility 
company, for example, were found to correlate .68 with supervisors’ ratings 
as to general value of the individual to the company (393). Selection of 
applicants in the light of their test scores served to reduce the percent of 
unsatisfactory employees by more than four-fifths. Army Beta tests of 
Czechoslovakian soldiers indicated 82 as the minimum I.Q. satisfactory for 
truck drivers, although men of 1.Q. 94 performed as well as any group up to 
115 (257: 278-84). For numerous other European reports of the use of 
mental tests in selecting and classifying military, naval, and civil personnel. 
the reader is referred to the Psychological Abstracts of the past three years 
and the Proceedings of the International Conference of Psychotechnology 
(257). For a discussion of aptitude tests, see Chapter V of the present 
Review. 

Sizable groups of transient unemployed in cities of the United States 
and Canada have been examined with a variety of tests of abstract intelli- 
gence, mechanical and clerical ability by Brentlinger (238), Kaplun (319, 
320), Morton (343) and Bryan (240). The last named found an average 
I.Q. among literate transients of 73 for the whites and 58 for the Negroes. 
A study of 206 former special-class pupils with I.Q.’s under 70 showed that 
only 39 percent had been able to hold a job for any extended period (229). 

A follow-up by Proctor (354) of 1,500 adults tested on the Army Alpha 
thirteen years earlier revealed a significant difference of 15 points in I.Q. 
between those reporting an occupation and those naming none. Boardman 
and Finch (231), however, noted that College Aptitude Ratings below the 
40th percentile for university students were entirely compatible with later 
employment in positions of superior grade. 


Intelligence and Vocational Plans 


Despite a decline of four points in the average 1.Q. in six California high 
schools over a sixteen-year period, the percent of pupils in 1933 avowing 
the intention of entering semiprofessional callings was fully as great as in 
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1917, while the percent expecting to enter professions had decreased from 
only 27.6 to 26.0. The investigators, Bell and Proctor (230), concluded 
either that youth is incurably optimistic or that vocational guidance in the 
schools surveyed was “getting nowhere fast.”” However, men at the Uni- 
versity of Minnesota in 1933 who hoped to enter professions proved ap- 
preciably fewer in number and more select in academic aptitude than the 
group of like intentions four years earlier (399). Wrenn (406) compared 
the 195 junior college men ranking in the highest 5 percent on the American 
Council on Education Psychological Examination with the 157 from the 
lowest 15 percent. The former showed a much greater constancy of voca- 
tional objective over a two-year period, and more than twice the percent 
rating A in the occupation of their choice on the Strong Interest Blank. 


5. Clinical Applications 
Intelligence in Psychopathology 


Abramson (223) found the intelligence of 1,100 unstable children mark- 
edly below average, even though the feeble-minded were excluded. The 
showing was especially poor on group tests and such Binet subtests as the 
absurdities, fables, and digits backwards. Sullivan and Gahagan (388) 
obtained a median I.Q. of 92 for 103 epileptic children. Of interest is the 
observation of Cavalcanti (250) that among fourteen clairvoyants tested, 


the highest 1.0. was 83. From a digest of the literature, Rouvroy (363) de- 
clared that no level of intelligence or degree of deterioration serves to char- 
acterize any particular disorder. Duncan (270), however, found among 
manic-depressives a high association of manic manifestations with mental 
deficiency, as contrasted with melancholia among the merely dull. 

There is accumulating evidence of the superiority of psychotics on verbal 
as compared with performance tests. Uhler (392) remarked this pattern as 
distinguishing children referred for serious personality defects and psychotic 
tendencies whereas among delinquents the reverse order was typical. The 
Babcock index of deterioriation, based on superiority of vocabulary over 
intelligence in general, likewise continues to receive much attention. Jastak 
(314) and Simmins (374) found a lower index on the part of hospital 
patients who later attained discharge; and Chipman (255) noted a high 
index among the unstable as opposed to stable feeble-minded. Altman and 
Shakow (226), however, while observing that higher scores in vocabulary 
characterize schizophrenics in contrast to either delinquent or normal adults, 
found no correlation between actual size of the index and other criteria of 
degree of deterioration. 

The comparatively good showing on vocabulary in old age seems beyond 
question. Gilbert (290) observed a large and significant inferiority in 
mental efficiency by the Babcock test of persons in their sixties as compared 
with others in their twenties. Moreover, those members of the older group 
who were regularly employed surpassed the unemployed by the same 
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criterion (291). This is in keeping with the finding that Minnesota Univer- 
sity freshmen were outscored by their older relatives on vocabulary tests, 
when speed of response was disregarded (256). 

Effects of therapy—Landis and Rechetnick (327) detected a distinct 
improvement in the test scores of paretics following pyrexia treatment. The 
malaria cure likewise appeared to raise the mental age of these patients, 
when the latter had not fallen below five or six years (380). Memory proved 
more difficulty to recover than other functions. 

Treatment of 317 mentally defective and retarded children was under- 
taken by means of a comprehensive remedial program, involving organ- 
otherapy, correction of defects, remedial education, and improvement of 
social environment (295). A gain in I.Q. followed for 45 percent of the 
endocrine cases, but only 1.2 percent of the 162 others, covering a wide 
variety of clinical types. 


Mentally Handicapped 


The characteristics and problems of mentally defective and scholastically 
retarded children are treated in two recent works by Burt (241, 242), with 
special attention to corrective procedures. For the mentally deficient, cura- 
tive measures appear almost negligible, though preventive precautions, both 
before and after birth, may be rewarding. Among the educationally back- 
ward, however, some 40 percent are found suffering from disabilities— 
physical, psychological, and otherwise—which may be relieved by appro- 
priate treatment with marked benefit to their accomplishment. 


Delinquents and Criminals 


Intelligence of delinquents—A statistical study by Fenton and Wallace 
(282) of 1,660 boys and girls referred to the California Bureau of Juvenile 
Research showed that delinquents and pre-delinquents together constituted 
the largest single group, and over 27 percent of the total. Except for the 
mental defectives, those cited for delinquency had the lowest 1.Q. of any 
type of case, averaging 93, as compared with 91 for boys in the Whittier 
State School. In a midwestern reform school, Charles (252) reported 30 
percent of the whites and 48 percent of the Negroes testing below 70 1.Q., 
as compared with 1.2 and 3.5 percent of boys of the respective races 
and like age in St. Louis public schools. Moore (342) noted that white boys 
in the Tennessee state reform school tested significantly lower, even on the 
non-verbal Myers Mental Measures, than boys in the state orphanage. 

Lane and Witty (329) found an equally clear association of low mental- 
ity with delinquency, but pointed out that the 1.Q.’s of 700 inmates of the 
St. Charles School were no lower than those of non-delinquents from the 
racial and social groups from which the majority of these boys were drawn. 
Moreover, no correlation was discernible between I.Q. and the number of 
their convictions, age of first arrest, or seriousness of the offense for which 
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they were committed. The same lack of relationship between intelligence 
and either age at commitment or type of crime was noted by G. E. Hill 
(304) in the case of youths in the reformatory at Pontiac. Among this older 
group, however, the frequent recidivists tested actually somewhat higher 
than occasional repeaters and first offenders. Finally, a group of psycho- 
pathic personalities studied by Michaels and Schilling (340) disclosed no 
significant correlation between the gravity of their misconduct and perform- 
ance on either the Porteus Maze or Binet tests. 

Language abilities—Two studies serve to supplement the observations 
reported under “Intelligence in Psychopathology” concerning the relation 
between vocabulary and other abilities which tends to distinguish the typi- 
cal delinquent from psychopaths as a group. Glanville (292) noted the 
marked retardation of language development among industrial school boys 
in comparison with the level of their general intelligence; and Fendrick 
and Bond (281) stressed the characteristic deficiency in reading achieve- 
ment. Among the 187 delinquents examined by the latter, retardation in 
reading averaged five years and eight months. 

Psychiatric analysis—In an unusual approach to the study of causal fac- 
tors, Healy and Bronner (301) compared over a three-year period, 153 
serious delinquents with 145 non-delinquent siblings of near the same age. 
The delinquents proved only slightly inferior in intelligence to their sibling 
controls, and did not display any special aptitude for manual tasks. Strik- 
ing differences in personality and emotional adjustment, however, were 
revealed, 91 percent of the delinquents giving evidence of extreme unhappi- 
ness or other emotional disturbance, as compared with 13 percent of the 
controls. 

Among almost 10,000 consecutive prisoners coming before the Psychiatric 
Clinic of the New York Court of General Sessions, only 18 percent were 
classed as mentally defective or definitely psychopathic. The great majority 
of the remainder, however, were diagnosed as having serious personality 
defects and disturbances, such as extreme egocentricity, chronic alcoholism, 
and emotional instability, leaving only 21 percent regarded as normal and 
adjusted (239). 


Speech and Hearing Handicaps 


A speech survey of 1,174 school children by Carrell (248) revealed both 
1.Q.’s and school achievement below average in the 10 percent having speech 
defects. On the other hand, 87 stutterers located in Pennsylvania State Col- 
lege and the universities of lowa and Minnesota, showed a median I.Q. of 
118 on the Otis S-A Test. This is seven points above the median for 2,500 
college students from twenty-one institutions, and equal to the 69th percen- 
tile of that distribution. Steer (383) found this in keeping with certain earl- 
ier observations, and inferred that selection may operate more rigorously 
in the case of stammerers. 
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An interesting study by Hofsommer (307) on somewhat limited numbers 
disclosed a seemingly marked influence of lip-reading upon I.Q. Among 17 
hard-of-hearing children receiving instruction in lip-reading over one to 
three years, eight recorded a gain in I.Q. and only two a decrease, whereas 
sixteen refusing such instruction showed twelve declines and no gains. 


6. Comment 


Few psychologists today look to an individual’s score on an intelligence 
test, alone and of itself, to determine the source of his difficulties or indi- 
cate the exact solution to his problems. It is entirely probable, however, 
that the outcome of such a test, judiciously chosen and competently ad- 
ministered, will contribute as much if not more to sound clinical appraisal 
than any other single fact obtainable. Properly supplemented with other 
diagnostic procedures, the information thus derived is virtually indispen- 
sable to intelligent attack upon a wide variety of problems. Moreover, its 
usefulness is increased and the danger of misinterpretation lessened by the 
steadily growing body of objective findings such as those exemplified in 
the studies here reviewed. 

So extensive is the literature that fresh investigations are scarcely ex- 
pected to cast important new light upon the relation between performance 
on our better known intelligence tests and academic achievement, talent, 
delinquency, dependency, or the like. More promising are those researches 
in which intelligence is measured as a means to analyzing the influence of 
various genetic and environmental factors upon mental development and 
efficiency, or as an aid in selecting subjects or groups for the experimental 
evaluation of practices in education, industry, social control, or physical 
and mental therapy. There remains, however, much valuable work to be 
done in determining the relative merits of different testing instruments for 
specific purposes and under comparable conditions. Especially is there need 
for comprehensive syntheses of data from many sources, with the working 
out of norms and indexes in forms most practically serviceable to the clini- 
cian, counselor, and social worker. 
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CHAPTER V 
Vocational Aptitude Tests' 


L. J. OROURKE 


More THAN FIVE HUNDRED STUDIES in the general field of aptitude measure- 
ment have been examined in the preparation of this summary. Many sig- 
nificant studies have been omitted because of space limitations. 

The discussion roused by the publication in 1934 of Thorndike’s Predic- 
tion of Vocational Success continued into the period under review. Thorn- 
dike (529) explained the reasons for certain details of method of his 
original study, restating his belief that our ideal “should be to be able to 
make a complete and precise inventory of a boy or girl and to know the 
predictive value of every item in that inventory, and of all significant com- 
binations of items, for every important purpose of life.” An analysis and 
survey of the general subject of aptitude measurement was published by 
Bingham (417), who offered suggestions as to the more useful tests in 
each broad occupational field. The third edition of Laird’s book (475) 
has appeared. In 1936 Kornhauser (470) suggested that in order to ad- 
vance the utilization of sound test procedure by industrial concerns we 


should: 


1. Concentrate on occupational fields where selective tests can be most clearly useful. 

2. Set up competent and economical testing services. 

3. Make the testing services known to those who would have use for them. 

4. Correct unfavorable attitudes by keeping all test work in competent hands and 
demonstrating what it can accomplish. 


Clerical Aptitudes 


Andrew (412) used multiple-factor analysis in studying the Minnesota 
vocational test for clerical workers, and reported that four relatively inde- 
pendent factors were revealed, viz., academic, clerical, spatial, and dexterity 
abilities. A committee headed by Dr. Marion A. Bills reached the conclusion 
that mental alertness tests are better than clerical for selecting clerical 
workers who do their jobs well and are promotable (439). Uhrbrock (531) 
also reported on the use of mental alertness tests in selecting clerical em- 
ployes. 

Davidson (438) reported correlations between test scores and promota- 
bility, or job level, attained after five or more years of service, as follows: 
Bureau Test VI, 75; Thurstone, .71; Modified Thurstone, .65; O’Rourke 
Senior, .77; Minnesota Clerical Numbers, .07; Minnesota Names, .34. 
Test scores correlate with supervisors’ ratings as follows: Bureau Test 


VI, .41; O’Rourke Senior, .40; Thurstone, .44; Modified Thurstone, .37; 


1 Bibliography for this chapter begins on page 334. 
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Minnesota Numbers, .27; Minnesota Names, .29. Copeland (436) corre- 
lated the Otis Self-Administering Test of Mental Ability and the Min. 
nesota Clerical Test with efficiency and performance ratings for a group of 
supervisors, clerks, and enumerators. The cvefficients of correlation range 
between .01 and .28. Clarke (431) reported on a battery of tests admin- 
istered to a group of cashiers (N not stated). Correlations were obtained 
with the three criteria: (a) actual production as measured by transactions 
handled per day; (b) supervisors’ ratings; and (c) actual error record, as 
follows: Otis S-A Test of Mental Ability, (a) .003; (b) .25; (c) — .06: 
amount checking, (a) — .02; (b) .02; (c) .07; change making-time, (a) 
.23; (b) .06; (c) — .04; change making-errors, (a) .36; (b) .06; (c) .39; 
dexterity, (a) .31; (b) .39; (c) .10. After weighting the tests by means of 
a regression equation Clarke found a correlation of .59 between predicted 
and actual production. 

Chase and Darley (427) made a study of age changes and occupational 
test scores among employed and unemployed female clerical workers, 
divided into five age groups from twenty to seventy-two years. A battery 
of occupational tests and the Bernreuter Personality Inventory were ad- 
ministered. Individual differences were found to be more marked than age 
differences. Levitoff (481) studied the relationship between test scores and 
the learning of stenography. Two hundred applicants for entry to steno- 
graphic training schools were tested, the scores later being compared with 
efficiency as measured by teachers’ ratings and the results of speed tests. 
Correlations were as follows: general intelligence, .55; memory, .45; atten- 
tion .44; speed, .20; and manual dexterity, .18. 

Moore (496) reported a study at the Westinghouse Electric Company in 
which an American revision of the National Institute of Industrial Psy- 
chology clerical tests was used. After being tested, the clerks were re- 
assigned, the new work being in 90 percent of the cases the type that the 
worker preferred. The program is reported to have eliminated dissatisfac- 
tion and created the general attitude that promotional possibilities depend 
upon examinations, likes, and aptitudes. It is significant, however, that 
90 percent of changes made were promotions. 

Correlations between test scores and a work criterion for punchcard 
operators were reported by the United States Employment Service (533) 
in a mimeographed study; a substitution test yielded the highest correla- 
tions with the criterion. 


Mechanical Aptitudes and Manual Ability 


Working with 103 boys in a technical school Alexander (409) adminis- 
tered verbal, performance, mechanical, and G tests, using measures of 
achievement in shop work, mechanical drawing, and so forth. The data were 
submitted to multiple factor analysis and five factors emerged: Spearman’s 
G; V, the verbal factor previously found by Dr. W. Stephenson; F, a prac- 
tical factor found especially in performance tests; X, a temperamental fac- 
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tor tentatively named “the will to succeed”; and Z, a fifth undefined factor. 
Factors X and Z seem to be unmeasured by intelligence, performance, or 
mechanical ability tests. Murphy (501), using verbal and non-verbal intelli- 
gence tests and mechanical aptitude tests, concluded that mechanical ability 
was a combination of a factor calling for speed of hand-eye coordination 
and a factor dependent on mental manipulation of spatial relations. Lang- 
don (476) reported a two-factor study of simple motor tests with particular 
reference to practice and diagnosis. 

After giving a battery of tests to over 2,000 subjects between thirteen 
and twenty, Heinis (434: 341-46) concluded that aptitudes mature at the 
conclusion of physical puberty and the beginning of mental puberty. Similar 
findings were reported by Messer (489) who gave 1,000 girls bead-threading 
and wool-knotting tests for selecting dressmakers’ apprentices. Scores indi- 
cate full development of simple manual dexterities by fifteen and one-half 
or sixteen years. Ozeretski (504) reported a metric scale for determining 
the development of motor ability of subjects four to sixteen years of age. 
The scale shows the total development of the motor level, and provides a 
profile showing general dynamic coordination, rapidity of movement and 
of simultaneous movements, and presence or absence of synkinesis. 

Frye (450) tested 200 siblings of high-school age with several mechanical 
aptitude tests, finding the only significant differences between the perform- 
ances of boys and girls to lie in the Stenquist test, on which boys excelled. 
Frye found a tendency for both boys and girls to reach their highest effi- 
ciency in mechanical test performance at the ages of seventeen and eighteen. 
With the exception of the spool-packing test B and the Stenquist test, per- 
formances of boys and girls between fifteen and eighteen years of age on 
a battery of motor tests reported by Viteles (434: 737-38) showed no sig- 
nificant sex differences. The Stenquist test yielded low correlations with 
academic grades when tried on 92 Negro boys (435). 

Quasha and Likert (514) made available a multiple-choice form of the 
Minnesota Paper Form Board Test, the revision being easier to score and 
having the advantage of objective scoring. Norms for a number of groups, 
including engineering students, are available. Bingham (419) reported that 
the MacQuarrie Test for Mechanical Ability correlates .29 with the Scovill 
Apprentice Scale and .38 with the Otis Higher Examination. Intercorrela- 
tions with other tests are given. He concluded that the MacQuarrie test 
is only a rough indication of the degree to which a person possesses some of 
the aptitudes desired in mechanical or manual occupations. That women 
seem to do as well as men in both tests was reported by Philip (506) in a 
comparison of an electric circuit tracing test with O’Connor Wiggly Block 
Test (N 471). Correlations of the tests with ratings in field work and labo- 
ratory courses (engineering students), and with ratings in practical work, 
wood-work, and electricity (technical students), ranged from .03 to .38. 

Metcalfe and Burr (490) revised and shortened the I. E. R. Girls’ Me- 
chanical Assembly Test and published norms for the shortened test. They 
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eliminated four tests which gave the greatest administrative and scoring 
difficulties or contributed least to the discriminative value of the test. The 
median score of yearly age groups showed irregular increase; therefore, 
the authors offered a table of norms based on mental ages and recom- 
mended its use. These norms are of questionable appropriateness. An em- 
ployer is ordinarily only interested in the skill of the employee relative to 
the skill of other available girls, not in whether her score is up to the aver- 
age of other girls of her mental age. 

Griffitts (455) studied the relation between anthropometric measures 
and manual dexterity (N = 60). He found little, if any, relationship be- 
tween scores in mental tests and height, weight, or height/weight ratios. 
Correlations between test scores and various hand measurements are, with 
two exceptions, all below .30. 

Among the investigators making use of extensive batteries of eighteen 
and twenty tests are Rupp (434: 177-92), who reported norms for both 
defectives and normals in tests of mechanical skill and mechanical intelli- 
gence; and Piéron (507, 508), who tested 1,461 subjects between twelve 
and twenty years of age, in developing a scale of technical aptitude. Of a 
battery of tests given by Biegeleisen (416) to predict success of technical 
students, tests of information and mathematics proved most significant. 

Pritchett (513) reported that men who made high scores on the O’Rourke 
Mechanical Aptitude and O’Rourke Non-language Tests were, in general. 
considerably more efficient than workmen making low scores, received 
more promotions and fewer demotions, and were less frequently laid off. 
Over 81,000 applicants were examined. Christiaens (429), working with a 
small group, found that his tests for solderers selected men essentially as 
their employers rated them. In order to measure potential ability, he devised 
tests involving simple tasks which could gradually be made more difficult 
and which, on being repeated at twenty-four-hour intervals, would be an 
example of abridged learning. In a watch factory, a study of the efficacy of 
finger- and tweezer-dexterity tests was made by Candee and Blum (423) ; 
total time on the finger dexterity test was found to correlate .26 with fore- 
men’s ratings. Working with eighteen subjects, inspectors at a paper mill 
found correlations between test scores and output figures to be as follows: 
dexterity and choice reaction test weighted and combined, .71; dexterity 
test, .70; tactile discrimination test, .61; choice reaction test, .52; visual 
acuity test, .23; color discrimination test, .12; speed of perception, — .11 
(460). 

Subnormals and mechanical aptitude tests—The problem of the voca- 
tional placement of subnormals continues to challenge psychologists. 
Measuring the manual skills and mechanical aptitudes of this group seems 
a promising line of attack. Pritchard’s study (512) of 79 subnormal boys 
twelve to eighteen years old with mental ages of from eight to fourteen, 
yielded correlations of .53 to .61 between mechanical ability scores and 
the criterion—a uniform ship construction task completed by each boy 
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and rated independently by two judges. Correlations between mental ability 
and mechanical tests were generally less than .30. Attenborough and Farber 
(413) however, found a relationship between intelligence and manual 
dexterity of .58, and between intelligence and mechanical ability of .45. 
These investigators found evidence that a common factor G is measured 
by tests of verbal and non-verbal intelligence, manual dexterity, and me- 
chanical ability. In the Burckhardt investigation (421), manual functions 
of feeble-minded children showed a definite slowing down in comp rison 
with normals, and defective coordination was also in evidence. 

Frandsen (449) gave the Minnesota mechanical assembly test to 100 
boys from 50 to 75 I.Q., finding the average percentile for this group to 
be 16. McElwee (484) found that among 150 subnormal children, degree 
of success in performing a construction test seemed to increase with chrono- 
logical age even though mental age was held virtually constant. Eighteen 
mentally defective girls skilled in the making of lace were tested by Taft 
and Kinder (526), who found that no significant relationship existed 
between ratings of the girls’ abilities and their scores on fourteen different 
performance and non-verbal tests. 


Professional Aptitudes 


Teaching—Studies of teacher selection will be found in issues of the 
Review of Educational Research dealing with “Teacher Personnel”: Vol. 
VII, No. 3; Vol. IV, No. 3; and Vol. I, No. 2. 

Law—The Yale legal aptitude test was used, together with undergraduate 
grades and intelligence test scores, in an attempt, reported by Husband 
(462), to predict grades at the University of Wisconsin Law School. The 
multiple correlation technic yielded a three-variable multiple of .64 which 
for three-fourths of the cases predicts actual earnings within five points. 

Engineering—Bingham (418) pointed out that the young person choos- 
ing engineering as a profession should have an accurate picture of his 
general scholastic ability and his ability to learn mathematics, to think 
about space relations, to understand mechanics, to master the physical sci- 
ences, and to use proper English. The author suggested some of the avail- 
able tests to be used in making these determinations. A new form board 
for selecting engineering apprentices was introduced by Oakley (502). 
Hsiao (461) reported an experiment in testing engineering apprentices in 
Nanking, in which an extended battery of ten tests was employed; as norms 
were not available, selection was made on the basis of relative standing in 
the test scores. 

Medicine—An experiment in predicting achievement of dental school 
students was reported by Harris (458) who found that five mechanical 
aptitude tests which he administered had only chance correlations with his 
two criteria, but that the Otis Intelligence Test had a correlation of .552 
with first-year average marks. Intelligence and pre-dental scholarship, when 
combined, gave a multiple correlation coefficient of .670 with dental school 
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marks. The social and economic backgrounds of students who were accepted 
and rejected by medical schools were studied by Schaul (519) who listed 
the factors which seem to characterize the backgrounds of each group. 
Moss (498) described the predictive value of the aptitude test that is ad- 
ministered annually to more than 9,000 prospective medical students in 
more than 600 colleges. The aptitude of the physician for specialization in 
roentgenology was studied by Chantraine (426), who devised a method of 
measuring the subject’s ability to read off, after two-, five-, and ten-minute 
periods of adaptation, figures of differing degrees of luminosity on a mov- 
ing cinematograph screen. 

Nursing—Clark and South (430) found that of 408 students of nursing 
those in the two higher intelligence groups, as determined by scores on the 
Ohio State University psychology examination, tended to evidence a greater 
dislike for nursing than those in the two lower groups. A Prague study of 
intelligence and nursing school aptitude showed a correlation of .72 be- 
tween I.Q.’s and nurse examination scores among students who had been 
admitted without intelligence tests (434: 374-78). With one exception all 
subjects with I.Q.’s below 95 failed the examinations. A. P. Johnson (465) 
gave a battery of tests to two groups of experienced graduate nurses, find- 
ing that, in comparison with unselected population, the nurses possessed 
the following: relatively large vocabularies, either objective or extremely 
subjective types of personality, high tweezer dexterity, average finger dex- 
terity, high accounting, and low engineering aptitude. 

Executive ability—Cleeton and Mason (432) developed a selection in- 
strument for executives which included questions on arithmetic, judgment. 
a test in non-verbal or symbolic relationships, synonyms and antonyms, a 
questionnaire, and an interest blank. Their treatment of the general problem 
included analysis of executive functions and traits and outlines for training 
procedures. A. P. Johnson (466) found that the higher ranks of executives. 
irrespective of age, have larger English vocabularies than the unselected 
population. Since there is a high correlation between vocabulary and general 
intelligence, these findings may mean merely that executives have higher 
intelligence. Raphael (434: 175-77) described the technic used to select the 
managers of British branch banks. The one-hour battery included written 
tests for (a) accuracy in checking, classifying, etc.; (b) ability to analyze 
contents of letters; (c) intelligence, framed in banking terms; (d) general 
knowledge; and (e) social and business tact. No data on validity are as 
yet reported. 

Music—Prediction of success in music has to some extent been treated in 
earlier numbers of the Review of Educational Research, as February 1938, 
p. 58-59. Studies previously reviewed will not be repeated here. 

Kwalwasser (472) inquired into the nature of musical ability, consider- 
ing the elements which enter into “the complex hierarchy of talents.” He 
found that general intelligence has little or no significant relation to musical 
intelligence, and that sex differences in music test scores vary according to 
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countries. European boys scored higher than girls, while in America the 
reverse is true. An inquiry into the nature of the creative process was under- 
taken by Patrick (505), who distinguished four stages of creative think- 
ing—preparation, incubation, illumination, and verification. Jacobsen 
(463) used both non-musical and musical instruments to measure the ability 
of subjects in dynamic and temporal control. 

Farnsworth (446) found that two music capacity tests surpassed intelli- 
gence tests as tools of prediction for grades in music courses only when 
these courses involved tonal perception and performance; for the more 
academic music courses intelligence tests were better predictors of grades. 
A new device for measuring pitch discrimination was described by Wyatt 
(537), who claimed for his instrument an accuracy surpassing that of the 
phonograph records of Seashore and Kwalwasser-Dykema. Vhe relation of 
physical characteristics (in this case finger-length and hand-width) to 
musical ability was studied by Taylor (527), who found that, contrary to 
popular belief, his pianists’ hands were significantly wider than those of the 
control group (N = 40 and 3(, respectively), and that there is a fairly 
significant difference in the length of the fingers and in the finger-length 
and hand-width ratio. Among the problems attacked by foreign investiga- 
tors were: the possibility of determining whether there are children who 
are inaccessible to music (428), and the degree of ability to adapt to the 
rhythm of a metronome by children between three and six years (480). 
In spite of the advanced status of musical aptitude testing Seashore (523) 
concluded that we have as yet developed no adequate system of guidance 
for a professional career in music. He warned against any regimentation 
procedures. 

Art—The present status of work in the measurement of artistic abilities is 
reported in a survey made at the University of Oregon (500), and in a 
monograph by Kinter (468). The latter stated that ability in the graphic arts 
is probably more complex than musical aptitude, but that it is possible for 
the basic factors to be analyzed and measured with the same degree of suc- 
cess as that attending the measurement of musical ability. That artistic 
aptitude of children is capable of considerable alteration under intensive 
training was the conclusion reached by Saunders (518) after subjecting 
artistically inferior and superior children to a two-year program of special 
instruction and motivation. A new art ability test was developed by Knau- 
ber (469) for use in vocational guidance, as teaching material, and as an 
objective measure of the art ability of a subject in relation to a group. Cane 
(424) listed the signs, chiefly psychological, by which we discern outstand- 
ing artistic aptitude in a child. 


Transportation Aptitudes 


Automobile and streetcar operating—Progress in the standardization of 
separate tests in these fields compares favorably with that in other fields 
(477, 478). Some investigators undertook an elaborate psychophysical 
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study of the individual; others isolated a single factor of driving ability 
and attempted to measure that; still others concentrated on detecting 
proneness to accidents. One of the more extensive batteries is that of 
DeSilva (440), who used tests for reaction time, steering ability, visual 
tests, and a general test of road behavior. The visual tests measured color 
vision, acuity, glare blindness, movement threshold, depth perception, and 
“tunnel vision.” Validation of the tests has yet to be completed, though 
steering tests correlated well with driving experience, and the author sug- 
gested they should form part of the examination for drivers’ permits. The 
National Institute of Industrial Psychology in England also has done con- 
siderable work with an extensive battery of tests of reaction time, resistance 
to distraction, vigilance, vision, visual coordination, judgment of spatial 
relationships, judgment of relative sizes of near and distant objects, judg- 
ment of speeds, and a simulated driving situation test (520). 

L. Johnson and Lauer (467) measured performance by right and left 
hands separately and by both hands, finding that the use of the single hand 
increases reaction time about 8 percent. B. Lahy (473) found that reaction 
time is affected markedly by fatigue; Missbach (494) discovered that 
reactive time may vary from 1.4 to 2 seconds, without any indication of 
carelessness. Ponzo (434: 284-91) reported that it is just as necessary to 
measure recovery time as reaction time. Greenshields (454) concentrated 
on brake-reaction time, while Heinis (434: 234-39), who devised measures 
of simple and choice reaction times, found that 7 percent of applicants were 
incapable of learning to drive, and that as many as 60 percent would prob- 
ably become unsatisfactory drivers! Ability to judge space—its depth and 
width—is one of the major qualifications of prospective drivers and tramcar 
operators, according to Drabs (443). He developed a new device, the tach- 
odograph, for measuring this ability. 

Variability of performance is the most important symptom of the unfit- 
ness of neurotics for driving, according to a series of researches done in 
Germany by Bena and Mayerhofer (414). Selling (524) is another investi- 
gator who included in his battery a “mental hygiene” test. Jekulin (464) 
divided causes of automobile accidents into four groups and reached the 
conclusion that the receptor rather than the motor functions should be 
stressed in training. That alcohol tends to make most drivers drive faster. 
without their becoming aware of the acceleration, was reported by Vernon 
(534), and that intelligence tests permit the elimination of the unfit, but 
do not select drivers of superior ability, was the conclusion of Mls (434: 
278-84) in a Prague experiment which showed that a minimum I[.Q. of 82 
on Army Beta was required for satisfactory driving performance. Vaio 
(434: 293-302) found that 51 percent of the accidents of motormen and 
bus drivers occurred among the 16 percent of men whose scores were below 
60 on Army Alpha. Several other European investigators report interest- 
ingly on the use of elaborate psychotechnical methods of selecting streetcar 
motormen and bus drivers and, in America, Cleeton (433) used an atten- 
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tion-reaction device to study 700 men for a city street railway system. He 
also employed tests of intelligence, emotion, and vision, and he stressed the 
importance of measuring emotional qualities. Mayerhofer (434: 256) used 
tests of reaction and attention to measure the discrepancy between the task 
and the available psychophysical energy. Quinan (515), measuring several 
hundred drivers charged with speeding, found an unusual incidence of 
left-handedness; he suggested that there is a possible relation between 
left-eyedness and nervous instability. 

Aviation—Aptitude for aviation has received considerable study, chiefly 
by foreign investigators. The fact that large differences in flying ability 
exist seems to many investigators to point to the existence of a definite “fly- 
ing talent” with distinct physiological and psychological components to be 
measured. Metz (491) in his study stressed the importance of the ability to 
orient in three-dimensional space which involves, among other things, 
vision, the vestibular organs, and muscle and pressure senses. Another 
German investigator, Edelmann (445), would shift the prevailing emphasis 
in aviation selection technics from physiological to psychological elements, 
not excluding, however, physical sensitivity measures, such as reaction time. 

That the total-personality picture of the prospective air pilot is the most 
important problem to be studied is the thesis of Gemelli (451), who called 
attention to nervous and mental fatigue and changes in attentive power 
and psychomotor reactions during prolonged flying. Pochtarova (510) 
standardized an interview for aviators; another technic outside the per- 
formance category was that by Lottig (482) who developed a writing test 
for altitude flying. This test consists in writing from six to eight figures 
correctly spaced in a horizontal line, and in writing numbers in exactly 
vertical columns. This simple test is claimed to predict with desired accuracy 
a subject’s tendency to air-sickness and the degree of its manifestation. 
Azoy (434: 203-15) held that the main criteria for selection of aviators are 
the aptitudes for perception and motor reaction. 

Extraverts have a better chance of completing flying training than do 
introverts, according to MclIlnay and Jensen’s study (485) of 538 cadets 
and student officers who entered our army air corps training center. Types 
of reality adjustment which characterized the students who graduated and 
those who failed were investigated and the authors reported that it is more 
important to determine the proportionate energy expended in the mecha- 
nisms than their actual degree in the individual. Princigalli (511) worked 
on the chronaxy test for the measurement of vestibular excitability, using 
both trained pilots and students as subjects and proceeding on the assump- 
tion that practice in flying should have resulted in a labyrinth hypoexcita- 
bility. The entrance examinations for Brazilian aviators seem to differentiate 
three groups—pilots, observers, and administrators—on the basis of scores 
made in a battery of tests for quick-wittedness, observation, logic, intelli- 
gence, control of emotions, and strength of will (420). In Argentina the 
serious accident rate is reported to have diminished since the establishment 
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of a psychophysiological laboratory for the examination of military aviators 
(492). The correlation between scores on the O’Rourke Complex Coordi- 
nator for testing American flyers and flying ability are at present being 
verified on the basis of follow-up studies of later performance. 

Railroad occupations—This field has received considerable attention 
abroad. Among the dozen or more studies reported is that by J. M. 
Lahy and Korngold (434: 245-55). Psychotechnicians in Germany began 
in 1917 to test for three different groups of railway employees—shop work- 
ers, office employees and ticket sellers, and the technical workers (495). 


Miscellaneous Vocational Testing 


Accident proneness—High scores in aptitude tests seemed to characterize 
apprentices who would be comparatively free from industrial accidents. 
according to a German study by Dombrowsky (442). Left-handedness was 
found by Grundler (456) to exist somewhat more frequently among accident 
prone employees than among the accident free employees in iron and stee! 
industries. He also found six tests out of a total of nineteen which differen- 
tiated accident prone employees. That intelligence and emotional stability 
are more important in predicting accident proneness among handlers of 
freight and general station employees than are tests of motor aptitudes was 
the conclusion of Lahy and Korngold (434: 140-47), who found that in- 
ability to adjust to an unfamiliar rhythm of work caused disorganization 
of mental and psychomotor reactions. Marbe (486) concluded that while 
simple tests might successfully differentiate accident prone from accident 
free school children, such tests will not suffice for industry, because acci- 
dent proneness in an occupation depends in part on the nature of the 
occupation. 

Employability—One of the emerging trends in aptitude testing has been 
the measurement of the relation of more or less prolonged idleness to intel- 
lectual and work capacities. Gilbert (453) investigated the extent to which 
increasing age affects the mental alertness of employed and unemployed 
people. The work of the Adjustment Service in New York City was reported 
by several investigators including Bergen, Schneidler, and Sherman (415). 
and Dodge (441). Morton (497) gave a battery of tests to unemployed 
men in Montreal. He found that unemployed relief clients, even with the 
age factor controlled, were inferior to unemployed non-relief workers, and 
that inferior test performance tends to be associated with the longer periods 
of unemployment. Trabue and Dvorak (530) reported an extensive study 
of the unemployed who registered at the Occupational Analysis Clinic at the 
University of Minnesota. Rempel (517) classified unemployed adults into 
four groups on the basis of visual acuity as measured by the Snellen chart. 
He did not find a significant relationship with various aptitude tests. 

Military service—In studies of army and navy aptitudes, foreign workers 
have directed their efforts toward detecting officer material among recruits 
(410, 471) ; selecting the superior artillerist (488) ; locating best materials 
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for specialists in the army (483, 538); and finding the soldier who will 
make the best foot courier (499). 

Occupational groups—Among the Minnesota reports on the use of tests 
is that by Dvorak (444), whose aim was to differentiate individuals in a 
given occupational group from the general population and from other 
occupational groups. Vocational guidance is given by comparing an indi- 
vidual’s ability profile with the ability profile of a given occupation ex- 
pressed in the same terms. Fontegne (448) constructed a vocational pro- 
file on which are listed the various aptitudes required for given details of 
analyzed tasks. Teegarden (528), using a battery of manipulative tests to 
measure such factors as speed, accuracy, handedness, delicacy of control, 
ability to follow demonstrations or instruction, and so forth, found that 
test performance of adults in fourteen men’s and sixteen women’s occupa- 
tions show no two occupations presenting identical combinations of test 
levels on the tests used. Dodge (441) divided 651 clients of the New York 
City Adjustment Service into thirteen occupational groups, administering 
four tests and finding considerable overlapping and variability in mean 
scores. That there is a fairly definite occupational hierarchy based on the 
level of intelligence of workers in various groups was pointed out by Wil- 
liamson and Darley (536), who called attention to the discrepancy between 
vocational choice and level of academic aptitude among Minnesota high- 
school seniors. All findings with regard to differences in trait averages must 
be interpreted with due regard for the variability of the trait in any occupa- 
tion and the overlapping with groups in other occupations. 

Police—Mata (487) reported on a comprehensive clinical, physiological, 
and psychological examination used to select police officers in Europe. 
Objective measures of emotional stability yielding an “index of imperturb- 
ability” form part of the battery. There is a test for details in testimony, 
as well as tests of practical judgment and critical sense; imagination; 
simple and discriminative reaction time to visual and auditory stimuli; 
memory for faces, forms, and colors; rapidity and range of attention; 
physical measurements; and records of temperament, character, emotional 
stability, and habits of discipline. Mira (493) found that tests concerning 
testimony and general intelligence were the most selective out of a battery 
used in hiring police officers in Spain, and O’Rourke (503) reported on the 
range and distribution of scores for his Police General Adaptability Test, 
administered to 2,591 men in nineteen cities for the purpose of establishing 
standards. A comparison of the test scores and the efficiency of the Wash- 
ington police force showed that 80 percent of those appointed as a result of 
the examination are above the average in efficiency. 

Salesmen—Giese (452) classified prospective salespeople according to 
the way they approached and performed the task of arranging a display 
board of different products. Schultz (521, 522) reported the use of tests 
of general intelligence, extraversion, ascendance submission, and interests 
in the selection of life insurance salesmen. He reported the best predictor 
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to be intelligence if above the 20th percentile; ascendance and extraversion 
scores were “moderately” successful. Lawe and Raphael (479) announced 
the finding that “a very high level of intelligence seems to be an actual 
disadvantage in a female sales assistant.” The most comprehensive and 
thorough study was reported by the U. S. Employment Service (533). 


The Need for Improved Criteria 


Reports of the predictive value of the same aptitude test range from low 
to high, and give us little basis for judging the comparative value of tests 
or even the validity of a single test. An explanation of the criterion and the 
factors used in it should accompany every report of a test program. When 
this is done, and when it is possible also to report the reliability of the 
criterion and to give some idea of its validity, the literature on testing will 
become more significant. The reporting of a coefficient of correlation be- 
tween test scores and a superficial criterion is not a contribution to litera- 
ture; it merely adds so many more pages of confusion. Some reasons for 


conflicting reports on the relation between efficiency and test scores or other 
data are: 


1. Those who evaluate the tests include different factors in their efficiency ratings 
for the same class of positions. Correlations between test scores and these different eff- 
ciency ratings might well be expected to be different. (a) Sometimes the criterion refers 
to routine work; at other times it refers to work requiring more judgment. (b) Some- 
times the criterion is based upon actual production records; at other times such fac- 
tors as interest, effort, and personality have a part in it. In the former case, a higher 
relation should naturally be found between efficiency and scores on carefully con- 
structed tests. If personality, temperament, and other general factors which the test 
does not measure enter into the total efficiency rating, the correlation will be lowered. 
For instance, the factors reported as office efficiency by various personnel men differ 
widely as to the relationships which are reported between test scores and office eff- 
ciency. A battery of tests may correlate .24 when a composite criterion including per- 
sonality, interest, tardiness, and so forth is used, and the same test may correlate .64 
when speed, accuracy, or production records alone are used. 

2. The factors in efficiency ratings are often given different weights by different 
raters. The relation between test scores and efficiency ratings will vary according to 
the weights assigned the factors involved. Some raters will emphasize speed; others, 
accuracy. 

3. Examiners often use tests which, although bearing similar labels, such as “clerical 
tests,” really are quite different owing to the differences in the range of the test or 
in the difficulty at which the test is focused. 

4. Some persons construct tests to measure factors that analyses of duties have shown 
to be important. Others simply go ahead without a thorough analysis of the duties of 
the position, hoping that with at least some of the tests they will find a significant 
relation with efficiency. 


5. The range of potential and actual ability of the groups tested varies widely. 
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CHAPTER VI 


Personality and Character Measurement! 


GOODWIN WATSON 


I; IS AN INTERESTING CHALLENGE to attempt in a few sentences to give the 
gist of the impression made by a review of several hundred studies of per- 
sonality and character measurement. No important new approach has 
emerged during the past three years. The superficial use of self-report ques- 
tionnaires has continued to be common. There are, however, distinct signs of 
increasing critical maturity in personality research. Ratings are used with 
admirable caution. The trend toward molar, organismic conception of the 
personality as a whole is clearly evident, and is exemplified by the extraor- 
dinary attention to the Rorschach test. Some interesting attempts to use 
factor analysis to discover units of organization more valid than “traits” 
have not as yet been based upon data obtained through observation which 
accords closely with the complex and telic organization of human behavior. 
Very encouraging have been a few studies combining a wide variety of ap- 
proaches—observational, metrical, clinical, statistical, social, and theoreti- 
cal—to expand the understanding of one aspect of personality, seen in rela- 
tion to the rest of the personality and the cultural milieu. 


Previous Reviews 


The first résumé of personality testing in the Review of Educational 
Research occupied an entire number (June 1932), with 282 references. 
The résumé in the second cycle (June 1935), consisted of four chapters, 
and included 416 references. 

Maller (702) reviewed character and personality tests for the Psycholog- 
ical Bulletin in 1935. Odbert (732) summarized contributions in this area 
from the 1936 meeting of the American Psychological Association. G. 
Murphy, L. B. Murphy, and Newcomb (724) have given an extensive 
critical evaluation of the present status of, and results from, personality 
testing. Their chapter XIII covers with equal thoroughness the field of 
social attitudes and their measurement. Stagner (789) summarized findings 
in experimental studies of personality, while Allport (543) made a major 
contribution to the literature in English on fundamental concepts of trait, 
type, and personality. Among the materials in other languages we have 
noted reviews by Wang (835) in Chinese: by Graf (630) and Biasch (559) 
in German; and, more superficially, by Wallon (833) in French. 

Some excellent theoretical discussions have appeared indicating, perhaps, 
the growing scientific maturity of personality study. B. S. Burks (570) 
criticized the trait concept and suggested that what we want to get at is a 
deeper and more general determiner of conduct which seems in fact to 





1 Bibliography for this chapter begins on page 340. 
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make prediction of the intent of behavior more valid than prediction of 
specific acts. She has suggested the term “nucleus” or “radix” for this un- 
derlying coherence in personality structure. Terman (813) indicated that 
the word “measure” is inappropriate in the absence of true zero points and 
equality of units. The organismic nature of personality suggests a clinical 
rather than a test approach. Certainly there is little reason to expect any 
“statistical shortcut” which will give quick and dependable understanding 
of personality. Further critical discussion was offered by Cattell (579) and 
Vernon (830). Reviews of typological theory are presented in Allport’s 
book mentioned above, and in articles by Spearman (783), Bryn (568), Ver- 
meylen (827), and Hofejsi (590: 744-47). With the political change in 
Germany has come a marked decrease in psychological contributions and 
a tendency to identify typological concepts with racial distinctions. There is 
not much evidence that typological theories are being taken up seriously or 
extensively in other cultures. 


Self-Report 


The most common practice in personality testing at present is to submit 
a series of questions asking the subject to evaluate his own symptoms and 
characteristics. Basic criticisms of this sort of instrument have been re- 
peatedly offered. It is well known that a “good” score on a neurotic inven- 
tory may be due to deceptive pretense, and may represent distressing mal- 
adjustment. It is well known also that subjects are frequently unaware of 
their own underlying personality characteristics, and may in all honesty 
vigorously deny those failings which they unconsciously resent in them- 
selves. There are other less serious criticisms which have been made. For 
example, the questions are often ambiguous; the units of score are by no 
means equal; and the categories under which responses are grouped are 
often merely linguistic and arbitrary, with little psychological coherence. 

Investigation fairly consistently confirms the low worth of the most 
widely publicized instruments. Jarvie and Johns (658) tried for several 
years with improved technics to get counselors who knew students inti- 
mately to rate those students on categories of the Bernreuter test. Correla- 
tion between these ratings and the test scores ranged from —.15 to .14 in 
one student, .23 to .40 in another, and —.14 to .23 in the third. “We are led 
to conclude,” they wrote, “that the Bernreuter Personality Inventory offers 
little aid in the isolation of personality problems.” Moran (719) found no 
differentiation between Thurstone inventory scores for 146 “adjuster” and 
41 “known neurotic” students. Hanks (634) tried to predict from autobiog- 
raphies to questionnaire tallies on “conventionality,” “attitudes,” and “per- 
sonality” with no very encouraging results. Landis, Zubin, and Katz (682) 
matched normal and very abnormal (mental hospital cases) individuals for 
age, intelligence, schooling, occupation, etc., and found the Bernreuter 
Personality Inventory and the Page Questionnaire of Schizophrenic Traits 
both worthless for differentiating the groups. Maller’s “Character Sketches” 
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showed difference in group means, but considerable overlap of individual 
“scores.” According to Benton (557) both normal and abnormal subjects 
found a substantial proportion of items to be ambiguous. Williams, Kep- 
hart, and Houtchens (847) reported that a change in method of administer- 
ing such a self-report brings substantial reversals in responses and wide 
variations in scores. This was to be expected. Students in cheerful mood 
gave a more “dominant” picture, according to W. B. Johnson (661), than 
did the same subjects on the same blank when they happened to be in a 
depressed mood. Olson (735), Moore (718), and others showed that anony- 
mity in various degrees alters frankness and scores; but it is unknown how 
much misrepresentation remains in the anonymous reports. Lentz (689) 
found reason to believe that not more than 60 percent of the Bernreuter 
items are reliably marked, with most fluctuation in scores near the mean and 
on items checked with average frequency. Lorge and others (694) showed 
that there is more consistency actually in the tendency to answer “Yes” or 
“No” or “?” to any kind of question in these blanks, than there is to show- 
ing the same assumed trait by answering “Yes” to one question and consis- 
tently “No” to ce of opposite import. Burnham and Crawford (572) car- 
ried on a most amusing investigation, using a pair of dice as subject. Ten 
testings revealed that the dice (i. e., chance scores) were emotionally mal- 
adjusted, introverted, and (on the Strong Vocational Interest Test) pos- 
sessed interests like a journalist or a Boy Scout leader. 

In spite of such finding the available tests of this type are expanded. 
Bernreuter (558) prepared a new manual including Flanagan’s invention 
of two scoring scales assumed to be “confidence in oneself” and “sociabil- 
ity,” and which, according to his factor analysis, accounted for all the pre- 
vious four scores. Shlaudeman (775) added another way of tallying the 
responses which he called an “Idiosyncrasy Scale.” Conway (591) made 
some strips to speed the counting. Shen and Liu (772) made a Chinese 
version. L. A. Thompson (816) used scales as Laird had done earlier, 
rather than Yes-No answers. Stefanescu-Goanga, Rosca, and Cupcea (791) 
made a Roumanian adaptation of Woodworth’s psychoneurotic inventory. 
Bell (555) made a study of home life, health, social relations, and emo- 
tional adjustment. Link (693) assembled 150 self-rating items to be checked, 
and grouped them for scoring about his idea of “extroversion,” “aggres- 
siveness,” “self-determination,” “economic self-determination,” and “sex 
adjustment.” He believed, for example, that children who say they like 
to go to Sunday School thereby demonstrate “extroversion,” while those 
who say they often lose their temper or dislike having others give them 
advice should be scored as lacking in “extroversion.” Dybowski (610) 
asked university students questions to reveal whether they were “strong- 
willed” or “weak-willed” in carrying out resolves. Williams and Chamber- 
lain (846) used the Allport Ascendancy-Submission test to study the devel-' 
opment of high-school girls, while Stevens and Wonderlic (796) questioned 
- its desirability in employment procedure. 


271 








2 ee EE 








EA LS OS 


REVIEW OF EDUCATIONAL RESEARCH Vol. VIII, No. 3 


Hildreth (644) and Brown (566, 567) asked questions proper for ele. 
mentary-school children and Cowan and others (594) worked jointly on 
a Variation of the Thurstone blank for adolescents. Pintner and others (745) 
made up their set of questions about school, teacher, classmates, self, and 
family. Symonds and Jackson (811) prepared a set for high-school pupils, 
adding, as Part II, a section in which students rate their fellows along the 
lines of May’s “Guess Who” technic. Stagner (790) juggled the Bernreuter 
items a little and came out with symptom counts around the terms “emo- 
tionality,” “persistence,” and two kinds of “introversion.” Pintner and 
Brunschwig (744, 747) modified typical questions to make them more 
appropriate for the deaf. Willoughby and Morse (851) found that spon- 
taneous reactions made by adults while taking some such personality in- 
ventory were of considerable psychological interest. Sex and guilt items 
most commonly aroused comment. Comparison of the self-revelation offered 
in the comments with the mark on the paper revealed that the written re- 
sponses of the subjects frequently gave a false impression. 

Results obtained by the use of symptom questionnaires, the validity of 
which is definitely questionable, are difficult to interpret. Hence Chou’s 
comparison (588) of Chinese versus American students, Pintner and 
Brunschwig’s comparison (746) of deaf children taught by different 
methods, Myers’ study (726) of home factors related to maladjustment 
in high school, the McCartney and Papurt testing (698) of tractables and 
intractables in a reformatory, and the comparison by Strecker and others 
(800) of senior medical students and unselected undergraduates, remain 
inconclusive. 

Closely related to personality questionnaires are a number of inquiry 
blanks which ask subjects about their emotions and wishes. Landis. Ferrall. 
and Page (683) queried normal and abnormal subjects about how much 
fear or anger would be aroused by each of forty situations. Means (711) 
asked a thousand college women how much afraid they were of snakes. 
cancer, fire, bulls, and 345 other stimuli. Hildreth and Keller (645) asked 
secondary-school pupils for their earliest memory, most exciting experience. 
greatest problem, greatest fear, and most common dream. Apparently no 
consideration was given to the conditions under which adolescents willingly 
reveal their inner life. Jacobsen (657) questioned the boys in one school 
class about their wishes for present and for adult life. Maslow (710) made 
extensive studies of dominance and found that there may be wide dis- 
crepancies between reports of dominance feeling and observed dominance 
behavior. Huth (655) revised the Bobertag-Hylla test increasing the self- 
description paragraphs to one hundred. Answers were qualitatively evalu- 
ated which reduced to some extent the errors associated with blind—objec- 
tive—counting of check-marks. Maller (703) put self-description phrases 
on cards, and children were asked to sort out into boxes labelled “Yes, I 
am the same,” or “No, I am different.” This technic, first invented by Hall. 
relieved slightly the tedium of checking a questionnaire, but did not add 
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otherwise to validity. Williamson and Darley (849) asked 2.500 students 
to indicate the frequency with which they enjoy social contacts and initiate 
social relationships. The mean of a group which had reported active and 
satisfactory social life on a previous questionnaire was nearly one sigma 
above the mean of a group previously reporting poor social relationships. 

The Humm-Wadsworth Temperament Scale (651, 652) contains 318 
items which have been empirically determined to be related to normal, 
hysteroid, cycloid-manic, cycloid-depressed, schizoid-autistic, schizoid- 
paranoid, and epileptoid diagnoses. Washburne’s Test of Social Judgment 
(837) contains self-report questions on purposes, social relations, concern 
for others, emotional stability, and preference for long-term greater rewards 
rather than immediate lesser rewards. Retest reliability over .90 was re- 
ported on a college group, and bi-serial coefficient of .90 showed excellent 
discrimination between adjusted and maladjusted extremes. One of the 
most hazardous situations in which to rely upon self-report would seem to 
be with prisoners in an attempt to determine which are worthy of parole, 
but Laune (687) is trying out a questionnaire containing “Yes”-“No” 
items related to success on parole. 

A classical attempt to disguise the purpose of the inquiry so that subjects 
might not easily influence their scores was ihe Pressey X-O. Durea (607) 
studied items checked by delinquent as compared with non-delinquent boys, 
and found the delinquents were more worried over “death,” and “sin,” more 
attracted by “movie star,” “joy riding,” “tap dancing,” and “candy,” and 
more apt to admire wealthy, handsome, and well-dressed people. Symonds 
(808, 810) asked adolescents and adults to rank suggested interests and 
problems in order of importance. Men rated higher interests in health, 
safety, money, and success; women gave higher rating to personal attrac- 
tiveness, manners, home and family relationships. The low rating given to 
sex should probably be interpreted not as evidence of its unimportance, 
but as further evidence for the distortion of self-report by inadequate self- 
understanding and by desire to give a conventionally favorable impression. 
Differences in maturity between girls of the same chronological age before 
and after first menstruation were studied by Stone and Barker (799) who 
made use of several measures. The Bernreuter test did not reveal differences 
but the Pressey Interest Attitudes Test and the Sullivan Scale for Measuring 
Developmental Age in Girls showed the psychological maturity which 
accompanies endocrine changes. 

Mira’s confidential questionnaire concerning affective memories, life- 
attitudes, inferiority feeling, methods of dominating, sense of guilt, etc., 
has been worked out with considerable characterological insight. The 
questionnaire was given by Alier i Gomez (542) to 337 Spanish subjects. 
He recommended taking account not only of answers but of attitude toward 
the questions. Wolff (855) made a more intensive study of twenty-five 
children and twenty-four young people who consulted the Psychotechnic 
Institute for vocational guidance, but the number of cases would hardly 
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justify his generalizations about age and sex differences. Stavél (590: 378. 
85) suggested to the Eighth International Conference on Psychotechnics 
a standard procedure to include: (a) graphological analysis; (b) question- 
naires based on Jungian types, Kretschmerian types, and psychoneuroti: 
symptoms and interests; (c) personal history; (d) observation and diagno. 
ses of behavior; and (e) interview with the subject. The interview is be- 
lieved likely to be much more rewarding after these preliminary examina- 
tions. 


Euphoria—Satisfaction and Happiness 


One of the chief objectives of education and of social organization is 
personal happiness. This important aspect of life cannot be measured exter- 
nally because of the poor correspondence between inner feeling and the 
impression one succeeds in making upon others (636). Self-report must 
be used; hence this measure, like those preceding, is in considerable degree 
dependent upon the achievement of satisfactory rapport. It is probable that 
reports of unhappiness are less contaminated with pretense than are reports 
of happiness. 

Some investigators have asked directly about general state of happiness. 
Symonds (809) used a seven-point rating scale, and found the happy stu- 
dents more concerned with affairs outside their own personal problems. 
(So science accords with the philosophy expressed in the New Testament! ) 
Chant and Myers (583) used the Thurstone scaling procedure on twenty- 
two statements ranging from extreme depression to extreme elation. Dis- 
tribution among normals was skewed toward the euphoric end; the cyclic 
patients were distributed in a U curve, schizophrenics were more unhappy 
than is sometimes assumed. Young (862, 863) found that college students 
rated their mood as cheerful five times as often as depressed, with laughter 
occurring many times a day and weeping only about once in three weeks. 
Barry and Bousfield (547, 564) asked subjects to rate euphoria on a 
scale of ten categories and compared results with the number of pleasant 
versus unpleasant associations the subject could produce. They found the 
two measures in substantial agreement. Women seemed happier than men, 
a finding which the authors relate to the fact that women reported an 
average of an hour more of sleep per twenty-four-hour day. 

Hoppock (647, 648) studied job satisfaction among all the inhabitants 
of a Pennsylvania town and found that a large majority enjoyed their work. 
A second study based on industrial psychologists showed that they were 
closely comparable to other professional and executive employees. Appar- 
ently their psychological understanding had not produced greater success 
in obtaining satisfaction; neither had it made them unusually morbid or 
depressed. 

Burgess and Cottrell (569) and Terman and Buttenweiser (814) devised 
questionnaires, the items of which could be scored to measure satisfaction 
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in marital adjustment. No relation was found between happiness and age 
at marriage, differences in age, or number of children. 

Boder and Beach (562) secured information of special value to edu- 
cators. Adolescents were asked to answer anonymously what government, 
parents, school, or church might do to increase the happiness of young 
people. Most demands were made upon the school. Of the demands upon 
the church, half were for recreational features. 

Crook (596) presented to students Dunlap’s dilemma: “If the only 
alternatives for your future were: (1) living your life over exactly as it 
had been lived, with no knowledge of your future as you went along, or 
(2) being painlessly and permanently extinguished, which would you 
choose?” About 14 percent of 220 white students and 21 percent of 107 
Negro students chose extinction. When the dilemma was rephrased to in- 
clude only the past two years, the percents were respectively 11 and 19. 
Differences between men and women students were not significant. 


Interests 


Questionnaires concerning interests also rely upon self-report, but under 
ordinary circumstances subjects are apt to be able and willing to report 
their conscious interests. Limited experience, of course, may make it impos- 
sible for a subject to know whether or not he would like a given activity 
if he tried it. 

A basic study of interests was made by Thorndike (817, 818, 819). The 
evidence showed that individuals believe that they have changed sur- 
prisingly little in interests between the ages of twenty and fifty. An impor- 
tant fact in interest measurement (like the observation reported above by 
Lorge on the Bernreuter) is that people have somewhat consistent tenden- 
cies to check L (Like) or D (Dislike). This general trend toward positive 
or negative report may have to be taken into account in interpreting any 
given interest. Weedon (838), in a dissertation, reviewed interest measures 
and raised fundamental questions concerning the observation and interpre- 
tation of behavior to indicate interests. 

Most famous of interest tests has been the Strong Test of Vocational 
Interests. During the period under review a Vocational Interest Blank for 
Women has been published (803). Young and Estabrooks (860, 861) 
developed a scoring scale to use the Strong test as an index of studiousness 
(i. e., factors other than intelligence making for high academic grades). 
It, however, showed an average correlation of only .35 with scholastic 
standing. Williamson (848) and Mosier (722) found the studiousness 
scale of some slight value in arts colleges, but of less predictive value than 
high-school scholarship, and of no value in technical or business schools. 
Newbury (728) selected items from the Strong and Miner tests to predict 
grades in a psychology course. Strong (802, 804) has compared interest 
responses of men and women and derived therefrom a masculinity-feminin- 
ity score. Kelly, Miles, and Terman (665) found that subjects instructed to 
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be as masculine or as feminine as possible could at will produce very great 
shifts in score. 

Interests of children were tested by pictures of play materials (826) 
and of men performing various types of work (628). Lehman and Witty 
(688) gave their questionnaire on occupational interests to nearly 27,000) 
boys and girls. Stability of vocational interest among children aged eleven 
to fourteen was checked by Lahy (590: 356-63). Baumgarten used the fa- 
miliar method which requires children to select book titles which would 
interest them most. Her findings were published in various European pub- 
lications (549, 550, 551, 590: 323-28). 

Inventories for the guidance of youth in matters of education, health, 
recreation, civic functions, and vocation have been standardized by 
Kefauver and others (664). Williamson and Darley (850) compared voca- 
tional choices of high-school students in Minnesota over the period 1929-33. 
showing, during that period, some decrease in choice of engineering and 
increase in agriculture, forestry, and skilled trades. The concentration upon 
a few choices evidences our continued failure to give youth a good know!- 
edge of occupations. Neubauer (590: 446-52) reported at the Prague Con- 
ference on Psychotechnics inquiries into occupational preferences of stu- 
dents in high schools, while Baumgarten and Ziircher (590: 393-99), Se. 
racky (770), and Grawitz, Laugier, and Weinberg (631) questioned youth 
with unflagging curiosity, not only about occupations but also about pleas- 
ures, reforms, life aims, life philosophy, emotional problems, tastes, etc. 
Reading interests were appraised by Altstetter (544) who found that teach- 
ers are believed not to have played a major role in influencing reading 
choices, and by Morgan and Leahy (721) who signed “culture” weights to 
current magazines. 


Opinions and Attitudes 


The validity of attitude scales depends in part upon the achievement o! 
clarity in the test, and in part upon rapport with the examiner. Only rarely 
does a report upon attitude testing use sufficient care to describe the situa- 
tion and the way a relationship was built up within which trustworthy 
responses might be expected. 

Concepts used in attitude measurement have been criticized and defined 
by Kulp (681) and Kirkpatrick (671). A general discussion in Chinese. 
based on Thurstone’s scales, was prepared by Wang (834). 

One development in attitude scales, mentioned in the June 1935 issue of 
the Review of Educational Research and continuing since, has been the use 
of a generalized or “master” scale which can be applied to any of a class 
of objects (755). One number (756) of the Purdue University Studies in 
Higher Education presented scales for measuring attitudes toward any in- 
stitution, for example, war, communism, marriage, Sunday observance: 
another scale for measuring attitude toward any defined groups of people: 
another for attitudes toward any homemaking activity; another toward an) 
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social or personal practice, such as drinking or petting; another toward 
any occupation or any school subject. A second series of studies (754) 
appeared in 1936 reporting student attitudes toward basic freshman 
studies; presenting rather unconvincing evidence on the validity of the 
Miller scale for attitude toward vocations; surveying attitudes toward 
recent economic policies (the subjects favored a gold standard, old age 
pensions, and a thirty-hour week!); measuring appreciation of poetry 
in accord with professors of literature; describing a scale for measuring 
attitude toward any teacher, and another for attitude of audience toward 
any play; and a three-axial scale to show approval of the objectives, organ- 
izations, and personal participations related to any social activity. Whisler 
and Remmers (842) in another article described a scale for group or 
personal morale. 

Attitude tests used during the 1935-37 period were classified as to subject. 
The largest group reported in psychological journals dealt with attitudes in 
the relationships of children and adults. The extensive use of attitude tesis 
for parent education and teacher rating represents a useful cross-fertiliza- 
tion of scale construction with another phase of applied psychology. 
Koch and others (678) used the Thurstone scaling technic to measure 
attitude concerning the amount of freedom children ought to have. Stogdill 
(798) found psychologists approving more freedom, and parents less. 
with students intermediate. Ellis and Miller (613) revised Wickman’s scale 
and technic and obtained a higher correlation between teacher judgment 
and mental hygienists. Fitz-Simons (617) developed a guide for scoring 
ease studies in terms of parent-child relationships. Ackerley and others 
(540) found attitude scales in agreement with findings from personal 
interviews. Duplicate and disguised questions were used to check on present 
practices. Hedrick (639) used the Ojemann scale (733) to measure atti- 
tudes toward self-reliance in children among a group of parents before and 
after six weeks of training. The effect appeared to have spread beyond the 
confines of subjectmatter covered in instruction. Stogdill (797) summarized 
twenty-eight studies of the attitudes of adults toward children which 
appeared during the first generation of the twentieth century. A particular]) 
helpful use of tests and questionnaires for educational purposes was 
Butler’s study (573) of 1,586 high-school pupils to discover what they 
knew and believed about family relationships and child development. 

Several investigators have been concerned with children’s attitude toward 
parents. Stagner and Drought (788) developed a scale, using the Thurstone 
technic, which accorded with self-ratings and biographical sketches. The 
article lapses at one point into the questionable assertion that since there 
is a positive correlation of .17 between attitude toward father and attitude 
toward mother, and since no difference appears between men and women 
in this respect, therefore the Freudian theory of the Oedipus complex is 
incorrect. It is unfortunate that studies which presume a disagreement with 
psychoanalysis so often attack a position which no Freudian would try to 
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defend. Meltzer (713) obtained more relevant data by asking 150 children 
to think out loud ten free associations following each stimulus word. 
“Father” and “mother” occurred in the list. The results were analyzed as 
well as counted. DuVall (608) used Bogardus’ Social Distance technic to 
study closeness of children to their parents. Well-adjusted children were, 
as would be anticipated, closer to both parents. M. Simpson (778) did an 
especially thorough and penetrating analysis of parent preference among 
young children. She used direct questions, questions about pictures, about 
stories, and about dreams. All sex and age groups except five-year-old girls 
showed more mother-preference. Peterson (754: 127-44) correlated atti- 
tudes held by parents and children on current social questions. The greatest 
resemblance was found within the same generation, for example, parent 
with parent or sib with sib. Again the greater influence of the mother was 
shown. Mott (723) used three questions from the Rogers test to study 
mother-father preference in children, finding that both sexes preferred the 
mother but, of the minority, more girls than boys rated father ahead of 
mother. 

Attitude toward loved ones was explored by Mangus (705), who asked 
700 college women to rate their fathers, boy-friends, and ideal mates. Ideal 
mates resembled boy-friends more than fathers which, according to the 
author, militates against an Elektra complex. Apparently Mangus did not 
take into account the age of father and attitudes of child when the Elektra 
complex is formed, or allow for the obvious fact that boy-friends resemble 
mate-ideals in age characteristics. 

Attitude testing has recently become a commercial enterprise. The Gallup 
poll is supported by weekly reports featured in more than seventy news- 
papers. Fortune has aroused considerable interest with its “Quarterly Poll 
of Public Opinion.” During the period of this review a large number of 
straw-votes and ‘opinion measures attempted to predict the 1936 presidential 
election. The ignominious failure of the immense Literary Digest poll, 
which had for so long been phenomenally correct, emphasized the dis- 
torting effect of a sampling error. Public opinion polls since that time 
have been more concerned with the representativeness of the sample than 
with large numbers of votes collected by a procedure which may involve 
constant errors. 

Stagner (786) emphasized another important technical consideration in 
his studies of Fascist attitudes. People who strongly disapproved of Nazi 
Germany or Fascist Italy often unwittingly shared many of the basic 
Fascist beliefs. Koeninger (679) reported that consistency of “radical,” 
“liberal,” or “conservative” attitudes was rare among high-school seniors. 
Lentz and others (690) published a C-R opinionaire to study conservatism- 
radicalism among college students. The test can also be scored to show 
deviation from majority responses, or what Lentz calls “minority-minded- 
ness.” Maller and Tuttle (704) developed a test on contemporary social 
problems, including sections on probable consequences, beliefs accepted 
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or rejected, news items of social policy, pleasant and unpleasant sugges- 
tions, selection of vital factors in civilization, and attitude on certain prob- 
lems and toward certain groups. Data from four colleges (820, 821) indi- 
cated that the more socially-minded students read progressive magazines, 
worked their way through college, were active on projects off the campus, 
and took courses with liberal teachers. 

Attitudes toward persons of other races were studied by the Bogardus 
Social Distance technic and checked against personal interviews with 
fifteen children. Zeligs and Hendrickson (864) found 87 percent agreement 
between test and interviews. Dodd (604) used a good combination of 
scaling technics. General statements of attitude ranging from friendly to 
hostile were first scaled by the Thurstone procedure and five were selected. 
These five were then applied as a social distance measure to fifteen national 
groups, eleven religious groups, and eight other groupings. Zeligs (865) 
asked sixth-grade children to write the most interesting true sentence they 
could think of about each of thirty-eight national or racial groups, thus 
revealing popular stereotypes. Horowitz (649) studied attitudes of chil- 
dren from kindergarten through eighth grade in three radically different 
cultures, with results which illustrate exceptionally well the way in which 
race attitudes are taken over from the social environment. Rosander (759), 
through a scale of attitude toward the Negro using statements describing 
behavior rather than opinion, maintained that although behavior descrip- 
tion scales and opinion scales agree closely, the former gives a clearer, 
sharper picture of the individual reactions. Davis (600) studied attitudes 
of 232 Negro students toward Negro traits, Negro militancy, and Negro 
occupations. Baumgartner (552) followed Likert’s technic of building a 
scale to measure the racial self-respect of the Negro. 

Several investigators have used scales to measure religious attitudes. 
Wilson (852) asked subjects to rate the extent to which influence of rela- 
tives, reading, voluntary service, seasons, solar system, plants, animals, 
and other factors may have been religiously helpful or harmful. Franzblau 
(620) found Jewish children who accepted religious beliefs to be less 
intelligent and less adequate in character. A “Religious Ideas Test” was 
developed for his study. Woolston’s questionnaire (856) covered theo- 
logical beliefs, religious practices, and cooperation between religious sects. 
Sturges (806) drew up a similar questionnaire on beliefs (orthodoxy) and 
practices (piety). Individual scores for orthodoxy showed a correlation of 
about .5 with piety. Kirkpatrick and Stone (670) evolved a scale of seventy 
statements to appraise religious attitudes of educated groups. Both Kirk- 
patrick and Woolston found parents to be more religious than offspring. 
Binnewies (560) submitted to students questionnaires on God, prayer, the 
Bible, Jesus, creation, and immortality, and found the more advanced stu- 
dents to be the least orthodox. Cristescu (595), under the auspices of the 
Roumanian Social Institute, formulated a questionnaire on magical beliefs 
and practices to use in connection with a comprehensive survey of village 
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Two scales for militarism-pacifism were reported. Miller (715) criti- 
cized the Peterson-Thurstone scale because, he claims, most subjects check 
items which range over a considerable portion of the scale. Zubin and 
Gristle (866) developed another scale which reliably distinguished R.O.T.C. 
from pacifist society members. Interesting evidence of the need for taking 
social influences into account has become apparent with changes of recent 
months in attitudes toward war. Subjects who, two or three years ago. 
scored alike in pacifism might well be far apart today because the groups 
(e. g., Fellowship of Reconciliation vs. Communist Party) in which indi- 
viduals belong have moved in different directions. The determinants of 
attitude are apparently better sought in social organizations and alignments 
than in the variables age, 1.Q., college class, etc., so often correlated with 
score. 

Among the miscellaneous attitude scales reported during the period of 
review we note attitude toward: nursery schools (597); high-school disci- 
pline (754: 214-24); feminism (672, 673, 674); relief (741); employers 
(822) ; and criminal or other offenses (779, 843). Taylor (754: 192-202) 
studied attitudes of Negro pupils toward high-school subjects and teachers. 
Rundquist and Sletto (762) published their Minnesota Scale for the Sur- 
vey of Opinions which ranges over a variety of subjects almost as hetero- 
geneous as the present chapter. Attitudes toward personal adequacy, family. 
law, the economic system, education, and life in general are recorded. 
More comprehensive, but less objective, was the questionnaire used by 
Pintiliescu (743) to study attitudes of coeds in Roumania toward education, 
professors, fraternities, vocations, feminism, politics, religion, and famil) 
life. 

In some of the studies, the center of interest was in the effect of some 
factor upon attitudes; the scale was only incidental. Thus, Manske (706) 
found some evidence that ten “non-indoctrinating” lessons did influence 
pupils in a generally liberal direction, while the two statistically significant 
changes were both in the direction of the teacher’s own attitude. Peregrine 
(754: 55-69) found that printed material favorable to the Negro brought 
favorable change which persisted for at least two months. Remmers and 
Morgan (754: 109-14) used an anti-Nazi story which did not change atti- 
tudes, perhaps because the meaning of the story was not clear. Chen (585) 
followed up his study of propaganda concerning Manchuria and found 
that the influence of a fifteen-minute talk died out in five to six months. 
Wilke (845) compared direct speech, speech over a loudspeaker, and 
printed material, and found their degree of effect to be in that order. As 
usual, the radical attitudes proved more stable. College study liberalized 
attitudes as measured by the Manley Harper test, but particularly for 
students who pursued certain courses (563). College tended to make the 
students tested by Telford (812) more lenient in demands for punishment 
of criminals. Bateman and Remmers (754: 27-51) found it possible to shift 
high-school students by prepared instructional material, to favor social 
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insurance, capital punishment, and the condemnation of labor unions. Just 
why this set of social objectives was chosen is not explained; presumably 
it illustrates the lack of rapport so often found between psychological ex- 
perimenters and educators. McConnell (754: 70-104) on the other hand, 
built his attitude test around educational objectives designed to help chil- 
dren act more effectively in dealing with four rural social problems. 

Hay (638) used an attitude scale before and after a debate which showed 
little residual change. This kind of finding illustrates a serious defect in 
most attitude testing. If the positions tested are thought of as lying on a 
horizontal scale from left to right, there is needed some measure along a 
perpendicular scale to distinguish “depth” of attitude. Two persons check- 
ing the same point on a horizontal scale may be quite unlike in super- 
ficiality versus profound concern. Hay’s audience after the debate may well 
have changed on a scale of concern in and knowledge about the issue, 
although not in average favorableness to one or the other side. According 
to Remmers (754: 105-108) “significant” change after a half-hour address 
on the League of Nations might have been interpreted, if a “perpendicular” 
scale had also been used, as due to a fairly ignorant, superficial, and “don’t 
care” status of attitude both before and after the talk. It is obviously easier 
to induce change in an attitude which does not have much depth. Another 
Purdue study by F. Peters and M. R. Peters (754: 15-26) showed a better 
attitude toward law in a school with pupil participation in government. 
An interesting social observation was made by Whisler and Remmers (841) 
following the 1936 presidential election. We recognize that attitudes in- 
fluence an election but apparently elections also influence attitudes. The 
winner was more popular after winning; the loser lost some of his former 
favor. Lorge (695) studied what might be called “susceptibility to prestige 
influence.” Subjects rated famous persons, rated quotations with no names 
attached, and then in a third test rated the same quotations attributed 
(truly or falsely) to some of the names. They found that high regard for 
the source of a quotation raised its value in the eyes of the student. Here 
again if a “depth” measure were used, important differences might appear. 
Sherif (773) asked subjects to rate short literary passages and found, as 
did Lorge, that when ascribed to an approved author, the standing of the 
passage was raised. 


Reputation, Rating 


The reviewer is impressed with the observation that although ratings are 
frequently made the basis for personality evaluation, a generation of criti- 
cism has made psychologists careful and critical in their use of such meas- 
ures. How long will it take, we wonder, before teachers, employers, and other 
persons less psychologically sophisticated will appreciate the limitations of 
the marks, grades, and ratings still so readily assigned in practice? Arge- 
lander (545) summarized some of the ways in which personality is mis- 
represented by ratings. Smeltzer and Adams (541, 781) compared the 
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technic of narrative summaries with that of a graphic rating scale and 
found much less reliability for judgments based on the narratives. This 
appears to be due to the greater complexity of the narrative account, and 
it may be that the simple check on a graphic scale, while more reliable. 
is less valid. Sears (769) presented some evidence that persons who lack 
insight (i. e., whose self-ratings do not agree with ratings of him by others) 
also tend to project their trait characteristics onto others whom they rated. 
Wolf and Murray (854) investigated the records of five judges who had 
worked together for two years. They have presented an excellent set of 
principles which they followed in order to make their work more accurate. 

Ratings often play a part in college admission. Hartson (637) reviewed 
over a thousand cases and found the ratings on intelligence and methods 
of work most highly correlated with Oberlin scholarship. Ratings by prin- 
cipals were most valid. Bent (556) found that rating of student teachers 
by judges in conference gave scores which correlated more highly with 
other tests than did averages of the same number of ratings made indi- 
vidually. Page (737) found ratings on “leadership” at West Point highly 
correlated with “bearing and appearance” but not closely related to aca- 
demic standing, which may be a reflection on the academic marks or on 
the military standard of leadership. 

Ratings of school children have been reported on Baker’s Detroit Scale 
(546), a “Guess Who” questionnaire (700) and the Winnetka Scale for 
Rating School Behavior and Attitudes (824, 825). Olson (734) found that 
asking teachers to name the boys and girls causing most trouble located 
only about half the children who fall in the worst 10 percent by the Hag- 
gerty-Olson-Wickman Scale. The state department of public instruction of 
Michigan (714) prepared another scale for recording behavior of elemen- 
tary- and secondary-school pupils. Doll and his associates (601, 602) in 
more than a dozen articles of which we list only two, described the Vineland 
Social Maturity Scale, consisting of 117 items related to increasing social 
competence from birth to adult independence. Although these are ratings 
rather than tests, an age-level arrangement like the Stanford-Binet has 
been followed. Results obtained from two informants usually differ by less 
than half a year. 

“Adjustment” of children who had been studied in a child guidance 
clinic was estimated by parents, clinicians, and teachers. Carberry (575) 
found that all three judgments were necessary to make a comprehensive 
picture. Davidson (599) found that good adjustment was more likely with 
children under fifteen years of age, normal or above in intelligence, with 
school placement corresponding to mental age. 

A major factor in personality adjustment is acceptability among one’s 
fellows. Moreno fostered extensive analysis of social relationships, finding 
that delinquents placed in congenial groups make markedly better adjust- 
ment in many ways. A new journal entitled Sociometry is now being pub- 
lished to foster particularly sociometric studies. Lundberg and Lawsing 
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(696) made a house-to-house canvas of a Vermont village and obtained 
the names of closest friends. “Isolates” (not chosen by anyone), “stars” 
(chosen by many “satellites”), “pairs,” “chains,” and other social relation- 
ships were identified. An interesting incidental observation showed a ten- 
dency to name, as friends, individuals of higher socio-economic status. 

Friendships among adolescent boys were studied by Pellettieri (742) 
with an “information finder,” with the usual emphasis upon similarity in 
age, proximity of homes, attendance at the same school, and interparental 
friendship. F. W. Burks (571) found that the George Washington Uni- 
versity Social Intelligence Test still shows the same lack of agreement it 
has demonstrated for years with sorority-sister ratings on social qualities. 
Soderquist (782) found ratings by associates to be reliable and valid 
measures of sociability among high-school students. Newstetter (729) 
checked choice of preferred association on a ballot with actual compresence 
as recorded by observers in a camp, and found a mean correlation of .73 
between the two measures. An especially important observation was that 
the best liked boys of the group were those to whom others were cordial, 
not necessarily those who were cordial to their associates. 


Objective Tests 


There continue to be many lines of approach to the objective measure- 
ment of personality—physiological, perceptual, intellectual, and the record- 
ing of behavior in controlled situations. Progress along this line has been 
slow, accompanied by the development of a few new concepts such as 
perseveration and level of aspiration. 

Physiological measures are usually disappointing in their correlation 
with complex behavior syndromes. G. N. Thompson (815) found blood- 
type unrelated to intelligence, introversion, or Pressey X-O scores. Omwake 
and her associates (736) found little relation between metabolism, blood 
pressure, and temperature, and intelligence, scholarship, actitivities, or 
Bernreuter scores. Hamilton and Shock (633) studied the acid-base balance 
of the body and found no large or consistent relationships, but some ten- 
dency for a correlation between instability and sub-breathing. They sug- 
gested that the physiological correlates, where found, may be result rather 
than cause of personality deviations. Gilkinson (629) reported that pitch- 
level of speaking voice showed a correlation of .3 or .4 with ratings and 
interests indicating masculinity, but hair distribution and skeletal propor- 
tions were unrelated to psychological masculinity. 

Five studies used Luria’s technic which involves voluntary hand move- 
ment coordinated with speech response to a stimulus word. J. W. Gardner 
(625) found words of sexual connotation differentiated by longer reaction 
time and more intense deflections in the record of the preferred hand pre- 
ceding and following the required response. Speer (784) checked motor 
responses to items from a symptom questionnaire. Runkel (763) perpe+ 
trated an exciting incident and found critical words identified by verbal . 
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responses in one-half of the cases. Houtchens (650) found a bi-modal dis. 
tribution of scores for delinquents on delayed response, unusual response, 
voluntary motor response disturbed, and involuntary movement. Ebaugh 
(611) reported the graphic record useful in both diagnosis and treatment 
of some psychiatric problems. Yarmolenko (859) studied the precision of 
hand movements among normal and neurotic children, finding that the 
largest differences occurred in rather static situations. In psychoneurotics 
as contrasted with cases of organic disease, active response helped to bring 
better coordination. Langer’s tremograph may prove helpful in further 
studies of the relation of motor disorganization to personality (685). 

Lie detectors have been more popular in the press than in scientifi 
literature during the triennium. Winter (853) found a cardio-pneumo- 
psychograph more reliable than verbal indicators in an association test 
applied to suspected thieves in a college dormitory, but recommended a com- 
bination of both types of measure. Just why Chant and Salter (584) sup- 
posed, in defiance of the long and impressive record, that the galvanic skin 
reflex would indicate the emotional nature of an attitude, they do not 
explain. 

Free association continues to be an interesting method for the explora- 
tion of personality, although there may be some doubt about its use as a 
“measure.” A. P. Johnson (660) standardized a new form of the Kent- 
Rosanoff test, intended to be comparable in results to the original. Laslett 
(686) has defended against criticisms his use of free association methods 
to differentiate delinquents. Carington (576) found that the association 
reaction time of mediums in the trance state was negatively correlated with 
the reaction time to the same words in normal state. This suggests the 
complementary nature of the trance personality. Siebert (776) showed how 
subjects who had recently been thinking of some emotional conflict tended 
to be influenced by it when giving free associations to a “neutral” word. 

From the Spearman school, studies of perseveration persist. Stephenson 
(792, 793) introduced the p factor in both British and American journals 
as the best available character tests. Cattell (580, 581, 582) improved the 
tests, eliminating any correlation with g among persons over ten years of 
age. A machine called a “perseverameter” may be used; paper-and-pencil 
tests which involve alternation of similar tasks are more common. A given 
test is valid only for its first application. Scores for p decline with age for 
adolescence, then rise to a stable adult score with no sex differences. 
Fatigue and conflict increase p score. Both extremely high and extreme!) 
low p scores seem to indicate inferiority of character (low w), and are 
common among delinquents and neurotics. After treatment and readjust: 
ment scores changed to moderate levels. 

Everall’s study (614) of perseveration in rats suggested a clinical inter- 
pretation of the tests. Apparently adaptive behavior regresses to persevera- 
tive if the adaptive efforts suffer too much obstruction. 

Perseverance was measured by Clark (589). who used word-building 


284 


“a re me eAeElUcrahmlU lB 





June 1938 PERSONALITY AND CHARACTER MEASUREMENT 


and number-building tasks, and also by Dorcus (605) in repetitive tasks, 
drawing a line slowly and solving puzzles. Clark found correlations with 
ratings which challenge the doctrine of specificity but Dorcus could find 
little agreement among the several tests. It should perhaps be noted that 
in this triennium no studies which used the old, pioneer Will-Temperament 
Test, devised by June Downey, came to our attention. 

Studman (805) continued a study of psychotics begun by Stephenson 
and Simmons, and found in addition to g, p, and w, a fluency factor, f, 
associated with elation, talkativeness, self-confidence, and excitability. 
Dybowski (609) found tests of perseveration correlated with ratings on 
negativism among adolescent girls. Kremm (680) compared a few city 
and country dwellers on ability to divide attention between two tasks, and 
to alternate from one task to another. 

Threshold of perception varies with the situation but may also indicate 
a personality factor. Bartlett (548) checked Travis’ finding on changes in 
auditory threshold during reverie, but with contradictory conclusions. Ac- 
cording to Bartlett, dementia praecox cases showed less change during 
reverie than did normal and psychoneurotic groups. Dahms and Jenness 
(598) agreed with Bartlett on the unreliability of threshold changes. A 
test of response to verbal suggestions on arm movement was more reliable 
but not related to the threshold changes. McDougall’s theory of introver- 
sion and extraversion as related to shifts in perception of a reversible figure 
was further studied by George (627) who checked the laboratory tests 
against a rating scale devised to eliminate the common correlation of extra- 
version with dominance and ascendance. Rau (752) used perceptual tests 
to classify racial groups (only 81 subjects altogether) in accord with 
Jaensch’s typology. Masaki and Otomi (709) observed the effect upon 
work of constant interruptions or of rejection of choices which the subject 
believed correct, and pointed to possible value for typology. 

Surprisingly few tests of the moral knowledge type have come to our 
attention during the period of review. Kinter-Remmlein (669) used an 
adaptation of the moral knowledge and also the conduct tests of the Char- 
acter Education Inquiry with 100 children in Paris. The French children 
performed best on moral knowledge, prediction of consequences, and co- 
operative behavior. Ackerley (539) used information items as well as 
attitude questions in testing understanding of child development by parents 
of elementary-school children. One section of the Moss Social Intelligence 
Test which deals with judgment in social situations was revised by O’Con- 
nor and others (731). 

The change to life-centered objectives and more progressive methods in 
many schools has brought achievement testing closer to personality 
measurement. Wrightstone (858) in his battery for appraising results of 
social studies teaching, included tests of ability to interpret data, to apply 
generalizations, to organize facts, and to judge civic beliefs and attitudes. 
Noll (730) devised items for measuring accuracy, intellectual honesty, 


285 








ry 
‘ 


i a ed 


Se na a ee ae 


REVIEW OF EDUCATIONAL RESEARCH Vol. VIII, No. 3 


open-mindedness, suspended judgment, insight into cause and effect, and 
self-criticism, all of which was summed up as the “scientific attitude.” 
Peatman and Greenspan (739, 740) reported an instrument for measuring 
superstitious belief among elementary-school children. 

Ordinary objective tests of school achievement may reflect certain per- 
sonality traits. Wiley and Trimble (844) asked students to indicate doubt 
or certainty on items, and suggested that this measures some characteristic 
of the individual. Hertzman (643) studied confidence ratings in connection 
with memory for names and photographs. Cheating on classroom tests has 
been used as a measure of dishonesty by a half-dozen investigators (593, 
616, 717, 774). Corey (592) found no correlation between verbal expres- 
sions of attitude about cheating and the amount of actual cheating done 
during self-grading of test papers. Parr (738) and Carlson (577) analyzed 
the conditions of pressure and classroom morale under which cheating is 
most apt to arise. Teachers more sensitive to behavior problems and apt 
to use marks for motivation reduced the amount of cheating. 

Other studies of honesty were made by Wrightstone (857) using Maller’s 
self-marking test, on which progressive schools showed up as more honest 
than traditional disciplined schools, and by Ruzicka (765) who improved 
on Zillig’s experiments described in our review of 1932. Maller (701) in- 
cluded his self-scoring test, along with measures of association, self-report 
of symptoms, and ethical judgment in a “CASE Inventory” with two 
comparable forms adapted to persons in fifth grade or beyond. 

Lewin’s concept of “level of aspiration” was found by Frank (619) to 
be consistent for the same individual in three different tests. Meerovitch 
and Kandaratzkaya (712) compared level of aspiration in hysterical, 
normal, and organic-lesion children. Another Russian report (768) used 
rate of satiation at simple tasks as a measure of personality, and found 
mental defectives less tolerant of monotony. 


Factor Analysis 


Stephenson (794) defined “psychometry” as concerned with measure- 
ments of single traits in a large population, and “type psychology” as con- 
cerned with the measurement of a population of traits within a single per- 
son. The interrelations of the traits have been explored in several studies 
by the use of the statistical technic called “factor analysis.” Rexroad (757) 
secured ratings of 850 students on ten selected traits, and analyzed results 
by the centroid method. The first factor was heavily present (.69 to .85) 
in all the traits and seemed to be an approximation to a faculty ideal of 
what a student should be. A second factor was plus in class but minus in 
social life outside. Loadings with third and fourth were slight. Chi (586) 
analyzed ratings on pupils in the elementary school at the University of 
Chicago, and found evidence for consistent differences in point of view of 
different raters, persistent halo effect, a general factor (w ?) and specific 
factors for each trait. The general factor presumably accounted for about 
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one-third of the variance, the specific factor for about half. McCloy (699) 
analyzed ratings of forty-three traits in a population of thirty-one students 
and named the four major factors “social qualities, dominance, individual 
qualities, and positive attitudes.” 

Two studies reported tests instead of ratings as a basis for factor study. 
Lurie (697) made up a test of 144 items presumably related to Spranger’s 
value types. The four basic attitudes appeared to be social Philistine, 
theoretical, and religious. Line and his associates (691, 692) gave tests of 
reaction time, word association, oscillation, perseveration, and the Bern- 
reuter scores. The major factor emerging from analysis was called “ob- 
jectivity” and thought to be related to Spearman’s w; another factor was 
something like /. Carter, Conrad, and Jones (578) analyzed children’s 
annoyances by a factor method and distinguished: (a) general annoy- 
ability, (b) annoyance at dirt and disorder, and (c) annoyance at some- 
thing like injuries to self-esteem. 


Integral Personality 


The most significant trend in personality measurement has been from 
measures of single traits to methods which allow for the expression of many 
characteristics of the personality in their natural structure and relation- 
ships. This necessarily involves a rather free situation in which subjects can 
impose any desired structure upon fairly plastic materials. The statistical 
methods developed for measures along single linear scales are not very 
helpful in treating these more comprehensive patterns. Vernon (828, 829) 
used a method of matching to bring out correspondence of the voice, 
photograph, and interpretative personality sketch for the same individual. 
Vernon’s summary of results so far obtained in matching experiments indi- 
cated much closer relationships than correlations of single measures have 
usually given. _ 

The Rorschach test has now attained the position of an outstanding 
instrument in the measurement and diagnosis of personality. Nearly forty 
studies employing the Rorschach test should be covered in this review. 
The Rorschach Research Exchange is now in its second volume. Most of 
the work on this essentially qualitative rather than quantitative method 
has been in the direction of clinical validation by correlation with the 
results of other diagnostic procedures, rather than the establishment of 
statistical norms. A third edition of Rorschach’s original monograph (758) 
was published, and it is reported that an English translation will soon be 
available. Beck (553, 554) published several articles and the first book 
on the method in English. Kerr (666) gave the test to a large group of 
normal and defective children and found that the blind diagnosis agreed 
with the case histories. Kerr’s study (667) of twins showed some similarity 
but by no means identity of personality pattern. Ganz and Loosli-Usteri 
(623) used the method with forty-three feeble-minded boys and found 
definite characteristics which distinguished their performances from those 
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of normal children. Marinescu, Kreindler, and Copelman (707) found that 
the interpretation of the Rorschach ink-blots seemed to follow the same 
laws as the conditioned reflexes of Pavlov. They applied the interpretation 
of the blots and their proposed physiological explanation to a study of the 
cerebral activity of twins, and found that this activity is very similar in 
twins and not at all similar between brothers and sisters of twins. Hertz 
suggested a method of administration (640) and published a historical 
summary of the literature (641), reviewing 152 titles. Hertz (642) also 
published Rorschach norms for the adolescent age group based on 300 
junior high-school students. 

Rosenzweig (760) outlined a project for validating the Rorschach method 
as a diagnostic instrument for functional mental disorders. The method is 
that of correct matching of case summaries and Rorschach interpretations. 
Schneider (767) discussed the application of the Rorschach to the measure- 
ment of vocational aptitude. Vernon (832) attempted the correlation of 
results on the Rorschach with other expressions of personality. He reported 
a correlation of .78 + .06 between Binet scores and the clinician’s ability 
to estimate his subjects’ intelligence from reactions on the Rorschach. 
Using a matching technic, 36 of 55 matchings were correct, yielding a con- 
tingency coefficient of .83 + .03. From experience with more than 350 sub- 
jects, including 100 normal adults, Guirdham (632) published a critical 
review of the method. G. E. Gardner (624) classified and tabulated the 
responses to the Rorschach blots of 100 normal adults of average I.Q. 
Klopfer and Sender (676) published a system of refined scoring symbols. 
Schachtel and Hartoch (766) noted that responses came in phases, and 
suggested a study of the sequences within the test, as well as the total 
scores for the test as a whole. Wells (840) compared and contrasted 
Rorschach procedures with association tests. Piotrowski (748) found that 
there are specific signs the presence of which in a record indicates that 
the personality of the subject has been affected by an organic disease of 
the central nervous system. 

Klopfer (675) reviewed critically the recent theoretical developments 
in the Rorschach method. He (677) also published instructions for the 
administration of the test. Sunne (807) published norms for young children 
based on tests of 1,655 white and 2,068 Negro children living in New Or- 
leans and 712 southern mountain children, and compared the Rorschach 
results with mental ages computed from intelligence tests. Beck (554) con- 
cluded that it is not possible at all times to interpret the same Rorschach 
factor as having precisely the same personality value. The trait meaning of 
any psychological process cannot be known, even after the process has 
been identified, until there is a picture of the personality as a whole. 
Vernon (831) published a summary covering publications since his review 
two years earlier. Skalweit (780) and Bleuler (561) engaged in a con- 
troversy over the relation of constitutional factors to the personality pat- 
tern revealed by the Rorschach. Frankel and Benjamin (618) commented 
particularly on the subject’s self-criticism during the Rorschach test. An 
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attempt at comparing races was made by Hunter (653), who tested 100 
white and 100 Negro adults cf comparable intelligence, education, occu- 
pation, and environment. The white group was more introversive, the 
Negroes more extraversive. An article by Juarros (663) indicated interest 
in the Rorschach method in South America. 

Another test based on pictures uses more definite representation of per- 
sons expressing affect in undefined situations. Morgan and Murray called 
this a “Thematic Apperception Test” (720). Subjects are asked to tell 
what may be in the mind of the person in the picture, what has probably 
happened to bring this about, and what is likely to develop as a result. 
The large range of freedom permits the subject to formulate the story in 
accord with determinants from within his own experience. Sterzinger (795) 
experimented with pictures each designed to arouse one of the instincts 
named by McDougall. The Schwartz social-situation test is made up of 
pictures presenting rather definite misbehavior by children. Harriman 
(635) used it with delinquent women. The “hen test” used by two Polish 
psychologists (659) in exploring moral feelings of children from eight to 
fourteen years of age, is based on a cartoon of two boys whose malicious 
prank causes the cruel death of four hens. 

Graphology is still a disputed area. Cantril and Rand (574) conducted 
one of the best experiments. Six subjects, each of whom was markedly rep- 
resentative of one of Spranger’s value types, prepared a sample of handwrit- 
ing which was then to be classified by 26 graphologists and 26 laymen. 
The marked success of the graphologists (C = .93) as compared with the 
indifferent results from untrained persons (C = .17) indicated that expert- 
ness should be respected. Practically all graphologists agree that diagnosis 
from handwriting, as from a Rorschach test, must be made in terms of an 
integrated picture, not in terms of item-by-item correlation of handwriting 
details with character traits. Hence experiments like Stackman’s (785) are 
hardly relevant. Inui (656) reported that the best estimates are made by 
comprehending the writing rhythm. Pohl (750) theorized about the rela- 
tive competence of cyclothymes and schizothymes, Feuerstein and Schén- 
feld (590: 611-27) about differentiating handwriting of different special- 
ized groups among physicians, and Reinhardt (753) about heredity and 
environment in the graphological analysis of twins. 

Stagner (787) tried to discover by rating specific aspects of voice and 
of personality (twenty-five judges and ten speakers), on what the judg- 
ments of personality from speech were based. Aggressive behavior and 
nervousness seemed to exert a clear influence on both judgments. Lange 
(684) found still photographs a poor method for estimating good automo- 
bile drivers. 


Observation 


Direct observation of behavior in natural life situations combines some 
of the advantages of objective testing and some of the freedom in the sub- 
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ject’s self-expression which is required for interpretation in terms of the 
whole personality. If records can be made quickly and accurately, and 
without distorting the naturalness of the behavior, this is probably the most 
valid approach to personality testing. The sound motion picture would 
probably be the ideal recording instrument except for the expense and the 
problem of arranging it so that the subject does not know his behavior is 
being recorded. 

Technics of observation have been worked out particularly by clinical 
psychologists and in the child development field. The trend is away from 
the Dorothy Thomas type of recording of single elements, toward more 
integrative description of the action-in-situation. Weiss (839) described 
the way in which behavior may be observed in the waiting room of a 
guidance clinic. Fries (621) made good use of observation in a play group. 
Homburger (646) asked college students to carry on dramatic play with 
children’s toys, and found the play revealing inner emotional tensions; 
Kinder and Humphreys (668) found the examination of mental defectives 
facilitated by observation of their free behavior in a situation which offered 
a variety of test objects. Washburn (836) improved the technic of recording 
activities among young children. Randall (751) described an “anecdotal 
behavior journal” such as might well be kept by every progressive teacher, 
to give evidence of child needs and of personality growth under tutelage. 
Pistor (749) observed the group action, cooperation, development of indi- 
vidual interests, and fostering of creative ability among school children, and 
used the results to help in evaluation of progressive education. Huth (654) 
who once was an exponent of a variety of specific psychological tests, ap- 
parently has concluded that the best evidence for judging personality is 
obtained from the continuous observations of parents, teachers, and youth 
leaders. 

More specific aspects of behavior have been brought under observation 
in other studies. Eisenberg (612) recorded the gait, speech, gestures, words, 
and other forms of expression related to feelings of dominance. Strehle’s 
book (801) attempted an analysis of the meaning of posture and motion 
and gestures in each part of the body. A rating scale for the amount of 
energy and vigor expressed from moment to moment in the activities of 
children was developed by Fales (615). Childers (587) found insecurity at 
the root of much of the hyperactivity. Shacter (771) recorded the time 
during which children could sustain their interest in simple tasks, and com- 
pared the results with ratings on introversion, using the Marston scale. 
Dudycha (606) studied punctuality in a variety of situations and concluded 
that considerable consistency could be found in the habitual behavior of 
given students. A very excellent record based on several approaches to 
behavior in young children was worked out by Markey (708). 

Two fruitful suggestions for further observation are worth noting. Ge- 
melli (626) urged the study of how the individual meets new and difficult 
situations, as better than the more static classification of types. He suggested 
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that tests, interviews, motor tasks, and general behavior observation might 
all be used to supply data on the individual pattern of adaptation. Rosen- 
zweig (761) proposed a test in which subjects work at insoluble problems. 
The question is whether, when the subject gives up, he blames the task, 
blames himself, or finds excuses for all concerned. 

An excellent review of contributions to child development, largely by 
observation methods, has been prepared by Jones and B. S. Burks (662). 


Combinations of Measures 


Occasionally a study achieves exceptional value because of its use of a 
variety of approaches no one of which alone could have contributed so 
much. An outstanding example is L. B. Murphy’s study (725) of sympa- 
thetic behavior in children. Most of the data were gathered by prolonged 
observation in a nursery school situation, but these were illuminated and 
supplemented by sociological study of the home and neighborhood, excel- 
lent rating scales, conduct tests in controlled situations, case studies, socio- 
metric analysis, Rorschach test results, and analysis of findings in relation 
to various theoretical systems. A technic of special value was the use of a 
dial rather than the traditional profile for graphic representation of strong 
and weak aspects of the all-around personality. A lesser study of sociability 
by Bowley (565) used observation of social contacts, verbal social re- 
sponses, and a social index. Dimock (603) made a long-term study of ado- 
lescent boys, including physical measures, mental tests, the Sweet Personal 
Attitudes Test, moral knowledge tests, rating scales, behavior observations, 
participation records, and other data. Even more comprehensive studies 
are in process in the California Growth study. Simoneit (777) described 
the combination of tests, ratings, interviews, and observation applied in the 
selection of German army officers. 

Many forms have been prepared for recording data about individuals. 
Van Alstyne (823) described the record system at the Francis W. Parker 
School in Chicago, which makes a place for the cumulative recording of 
classroom observations, test results, ratings, questionnaires, and objective 
data. A German school psychologist (764) outlined his method of pupil 
analysis including the following: informal observation; data on health; 
family and environment; analysis of pupil’s speech, writing, drawing, 
bearing, skills, and emotional life; test results; and the personality seen 
as a whole. The Roumanian government makes compulsory in public 
schools a personality inventory (622, 716, 727) based largely upon obser- 
vation but including: intelligence tests; tests of memory, attention, imagi- 
nation, etc.; data on heredity, social environment, and cultural level; 
family; results of medical examination; data on emotivity, temperament, 
will, and character; and a general personality characterization. The record 
is presented at school-leaving as a basis for vocational guidance. 








CHAPTER VII 


Applications of Tests of Non-Intellectual Functions’ 


CHARLES CECIL UPSHALL 


Tae PERIOD UNDER REVIEW has been very prolific. Approximately 1,000 
studies, investigations, articles, and books have appeared on the topics cov- 
ered in this chapter. Nearly 500 studies were found which used measuring 
instruments, for example, questionnaire type tests, self-ratings, ranking of 
interests, ratings of others by experts or groups, time sampling procedures, 
more or less controlled observation. controlled or free interviews, and asso- 
ciation tests. Over 500 titles were found which indicated that problems of 
personality, character, delinquency, and personnel procedures had been 
discussed or evaluated. 

Out of this large number of references, 120 studies have been selected 
for special consideration in this chapter. The following criteria of selection 
have been applied: studies which used the best experimental technics were 
first selected; studies containing large populations were included rather 
than those with small populations when similar problems were being stud- 
ied; other things being equal, those studies which made use of the most 
complete statistical procedures in reporting results were chosen. The num- 
ber was further limited by giving preference to those studies which used 
well-known measuring instruments. Exceptions to this rule were made when 
the excellence of the study and the significance of the results seemed to jus- 
tify inclusion in this review. Finally, in those fields where there were a 
large number of studies, those were selected which gave the truest picture 
of the conclusions drawn from all. 


The studies reviewed have been grouped under the following captions: 


. Investigations of personality in so-called normal populations 

. Investigations of personality in clinically abnormal populations 
. Attitudes and interests 

. Character, behavior, and delinquency 

. Occupational fitness and guidance. 


Virwonre 


Personality—Normal Persons 


Marital relationships—Many interesting facts about the personalities. 
adjustment problems, and mental hygiene of married people have been 
reported during the 1935-37 period. Many different measuring devices 
have been used. Schooley (951) found that the 80 married couples which 
were studied tended to be more similar in personality than random individ- 
uals at the beginning of their married life and grew more similar with time. 





1 Bibliography for this chapter begins on page 353. 
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Willoughby (981). in a detailed study of 152 married couples found, among 
many other things, that husband and wife tended to be alike. For example, 
the coefficient of correlation between the neuroticism score of the wife and 
that of the husband was .27. Johnson and Terman (918) compared the 
personality characteristics of happily married, unhappily married, and 
divorced persons. The unhappily married gave evidence of being more 
neurotic and introvertive, having more intolerant attitudes and more voli- 
tional inadequacy than the happily married. The latter had more uplift 
interests and social adaptability. The happily married men showed more 
tolerance and sympathy than the happily married women. In general, di- 
vorced women had more self-reliance, independence, tolerance, initiative, 
and conative intensity than the women in the other two groups. Both men 
and women who were divorced had more intellectual interests than either of 
the married groups. Terman and Buttenwieser (973) gave the Bernreuter 
Personality Inventory and the Strong Vocational Interest Blank to married 
couples. The relationships obtained between the scores of husband and wife 
were small but positive. Some items proved to be diagnostic of marital com- 
patibility. Bernard (871) also studied the problem of personality factors 
in marriage by means of the Bernreuter Personality Inventory. He found 
a few positive relationships such as those between (a) health of husband 
and his marital dissatisfaction, (b) health of wives and marital dissatisfac- 
tion of husbands, (c) use of birth control methods and neuroticism in 
women, (d) neuroticism and marital dissatisfaction in women. 

Adjustment—Conklin (883) gave the Thurstone Personality Schedule 
to 100 college students in an effort to find the influence of family adjust- 
ment on the neuroticism score. Those who admitted family difficulties mani- 
fested a greater tendency to abnormalities of personality than those who did 
not. Crook (886) studied the constancy of neuroticism scores and self-judg- 
ments of constancy among college students. The Willoughby adaptation of 
the Clark-Thurstone Personality Schedule and self-ratings of change were 
used. Applications of the measuring instruments were made in September 
and May. Neuroticism scores were found to be less stable than intelligence 
ratings. Self-ratings of change were very inaccurate with a constant error 
favorable to the student. 

Willoughby (982) gave a modified form of the Thurstone Personality 
Schedule to more than 500 unmarried women whose ages ranged from fif- 
teen to seventy-eight years. Age, education, and consciousness of a current 
emotional problem were related to emotionality; occupation was not. Dario- 
tis (887) concluded after an intensive study that the validity of the Thurstone 
Personality Schedule is “highly questionable” as a measure of neurotic 
tendencies among college students. Hardy (908) made an interesting study 
of the adjustment scores of adolescents having a history of frequent illness 
during childhood. A low negative coefficient of correlation was obtained 
between frequency of illness in childhood and personality adjustment scores 
during adolescence or early maturity. Burnham and Crawford (880) stud- 
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ied the vocational interests and personality of a pair of dice as these were 
indicated by the Strong Vocational Interest Blank, the Bernreuter Person- 
ality Inventory and the Thurstone Personality Schedule. No one should use 
any of these three inventories for the purpose of making recommendations 
for individuals until they have read this study. The answer to each item 
on ten copies of each scale was determined by throwing a pair of dice. The 
Thurstone Personality Schedule showed that the dice were emotionally mal. 
adjusted; on the Bernreuter Inventory the percentile ranks were well above 
50 when each section of the scale was arranged from the most to the least 
desirable traits. Nine of the ten vocational interest blanks indicated an inter- 
est pattern typical of successful Boy Scout masters. The authors make the 
following pertinent statement: “From these data it may be concluded that 
it is perfectly possible to secure by chance scores on these tests of a nature 
which, if made by human subjects, might be regarded as significant, and 
which in present practice are frequently so interpreted.” Keys and Guilford 
(920) arrived at the same conclusion for the Bernreuter Personality Inven- 
tory. 

Brown (876) studied, by means of the Brown Personality Inventory, the 
influence of race and locale upon the emotional stability of 712 children. 
Differences in race and locale were not related to emotional adjustment but 
adjustment was related to socio-economic level, the higher level being 
better adjusted. Duggan (895) found physical education majors more 
stable emotionally, more extraverted, and more dominant than undergrad- 
uate women who were majors in other subjects. Forlano and Axelrod (903) 
studied the effect of repeated praise or blame on the performance of fifth- 
grade children who were classified as introverts or extraverts by means 
of the Pintner personality test. The introverts made gains in the learning 
situation sooner than the extraverts. In general, both groups of these chil- 
dren made bettér progress as a result of blame than of praise. Forlano and 
Watson (904) found that groups which were successful in military train- 
ing were consistently more extraverted than those which were not very suc- 
cessful or were failing. Extraversion was measured by the Inex Self-rating 
Scale. 

Family resemblances—Family resemblances in personality traits as meas- 
ured by personality inventories of the questionnaire type have been studied 
sufficiently to indicate certain consistent trends. Pintner and Forlano (943) 
gave two questionnaire type personality tests to 157 pairs of siblings in 
Grades IV to VIII, inclusive. The intersibling correlation on an intelligence 
test was .23, on one of the personality tests, .19, and on the other .20. Sib- 
lings of the same sex were more alike than siblings of opposite sex. Sward 
and Friedman (968) gave the Bernreuter Personality Inventory to 387 
triads composed of mother, father, and offspring. Temperamental resem- 
blances were lower than resemblances in intelligence or physical traits. 
Children tended to resemble the parent of the same sex more than the parent 
of opposite sex. Carter (881) found that identical twins tended to be more 
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similar on the Bernreuter Personality Inventory than fraternal twins. When 
the 133 pairs of twins were compared with a control group, the twins were 
more extraverted, self-confident, sociable, gregarious, and stable. No sig- 
nificant differences between twins and controls were found for dominance or 
self-sufficiency. Yule (986) gave a battery of tests of the Stephenson type 
to 115 pairs of twins and a control group of 60 unrelated children. Mono- 
zygotic twins were more alike than dizygotic and dizygotic of like sex were 
more alike than those of unlike sex. Ushijima (976) found a coefficient of 
correlation between father and daughter of .207 and between mother and 
daughter of .304 when the Awoji and Ohabe Extraversion-Introversion Test 
was used. Stagner and Katzoff (963) studied the relationships between per- 
sonality as measured by the Bernreuter Personality Inventory and order of 
birth and family size. Order of birth was not related to personality. A little 
more independence was found in those who had younger brothers or sisters 
and there was a slight personality advantage in favor of those from small 
families. Schubert and Wagner (953) used a modified form of the Wood- 
worth-Mathews personal data sheet with 229 boys and 248 girls in high 
school and 117 transient boys. The transient was particularly maladjusted 
to his family situation but on the whole he was relatively stable emotionally. 
The academically successful boy and the academically unsuccessful girl 
were more unstable than the other high-school seniors; only children did 
not show as many signs of unbalance as those of larger families. 

Racial differences—Shen (955) found that the Chinese were more neu- 
rotic, more introverted, less self-sufficient, and less dominant than Ameri- 
cans as judged by their responses to the Bernreuter Personality Inventory. 
Chou and Mi (882) found that Chinese and American students differed in 
the same direction on the Thurstone Personality Schedule when used in a 
Chinese translation with 850 Chinese students. Pai, Sung, and Hsii (937) 
gave the Thurstone Personality Schedule to 617 Chinese males. The mean 
score was 51.82. The coefficient of contingency between the neurotic score 
and the clinical diagnosis was .47. Sward and Friedman (969) compared 
the responses of 625 adult Jews with those of 625 adult non-Jews on the 
Bernreuter Personality Inventory and the Hiedbreder Introversion Inferior- 
ity Questionnaire. Neurotic and inferiority scores of Jews exceeded the mean 
of the non-Jew group by about 60 percent. Sward (970) gave the Bernreuter 
Personality Inventory to 114 Jewish families and 113 non-Jewish families. 
Four distinguishing patterns were found for the Jews: (a) gregariousness 
or strong social dependence, (b) submissiveness, (c) drive and over action, 
(d) various anxiety states. Garth and Garth (906) gave the Allport A-S 
Test to 269 educated Indians and 101 white males. The white males were 
definitely more assertive than the Indian males. 

Social maturity and adjustment—Oldham (935) found no significant 
relationship between the socio-economic status of 319 adolescent Negro 
girls and personality as measured by a battery of intelligence and person- 
ality tests. Diamond (888) found a definite relationship between change in 
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personality and radical political activity but could not define the exact 
nature of the relationship. Symington (971) gave a battery of personality 
and mental tests and questionnaires to ten groups of subjects totalling 612. 
Two hundred and eighty-seven had a conservative religious background 
and 325 had a liberal religious background. Liberalism in religious thought 
was not related to the types of personality measured by the Bernreuter or 
the Allport A-S tests. There was positive relationship with intelligence, 
amount of education, and attendance at college courses of a liberal type. 

Pintner, Forlano, and Freedman (942) found that chronological age and 
mental age were more closely related to choice of friends than personality 
and attitude test scores. Eight hundred and nineteen pupils in Grades V 
to VIII were used. 

Kirkendall (921) found no relationship between changes in adjustment 
and changes in home environment when these were measured by the 
Symonds Adjustment Questionnaire and the Myers Intra-family Question- 
naire. Age bore some relationship to changes in adjustment, and the ad- 
justment of over-age pupils presented a more acute problem than that of 
under-age pupils. Young, Drought, and Bergstesser (985) found that 
neither scores on the Bell Adjustment Inventory nor scores on the Wiscon- 
sin Scales of Personality Traits were related to the discrepancy which 
existed between the predicted scholarship of University of Wisconsin fresh- 
men and their actual scholarship. They concluded that “emotional factors. 
as such, are not important apart from the inner state or attitude of the 
person experiencing that situation.” 

Houtchens (916) used the Luria version of the free association test with 
three groups of junior high-school boys. One group, in the opinion of their 
teachers, was composed of boys least well adjusted, another was composed 
of boys who were best adjusted, and the third served as a control group. 
The conclusion was reached that teachers select as their best adjusted chil- 
dren pupils who, according to the test, are maladjusted. 

Bradway (875) and Doll and McKay (894) used the Vineland Social 
Maturity Scale to study the social maturity of three types of handicapped 
children. Bradway found that a group of 92 deaf children, of ages five to 
twenty, were 20 percent inferior in social competence at each age level. 
Doll and McKay matched 38 children from the special classes in Vineland. 
New Jersey, with 38 children of similar chronological and mental ages who 
were living in the institution. The scale showed that special-class children 
were superior to institutional children in social maturity and especially on 
those items of the scale where self-direction carried the most weight. 

Miscellaneous—Habbe (907) found no personality differences between 
48 boys with normal hearing and 48 boys with impaired hearing on three 
measures of personality. There were more speech difficulties among those 
boys whose impairment of hearing was {5 decibels or more. Brunschwig 
(878) found very few significant differences on personality traits between 
a group of children with normal hearing and a group with impaired hear- 
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ing; the hearing children were higher in social adjustment. Meltzer (929) 
gave the Rorschach Ink Blot Test to stuttering and non-stuttering children. 
The stutterers were more talkative than the non-stutterers. Rate of speaking 
was about the same for the two groups. A description of certain variations 
in the responses of the two groups was given. Stone and Barker (965) found 
no significant differences between postmenarcheal and premenarcheal 
girls of the same chronological age on the Bernreuter Personality Inventory. 
Significant differences were found between the two groups on the Pressey 
Interest Attitude Scale and the Sullivan Test for Developmental Age. Wrenn, 
Ferguson, and Kennedy (983) found no significant differences on the neu- 
rotic and introversion scores of the Bernreuter Personality Inventory be- 
tween one group of junior college students selected from the upper 5 per- 
centiles of an intelligence test and another group selected from the lower 
15 percentiles. Superior men and women were found to be more self-suffi- 
cient and inferior men more dominant. 

Ayer and Bernreuter (867) studied the relationship between discipline 
and personality traits in little children. There was a positive relationship 
between attractive personality traits, as measured by the Merrill-Palmer 
Personality Scale, and allowing children to profit by the natural result of 
their acts. Bartlett (869) found no significant relationships between sug- 
gestibility as measured by the Hull “Sway Test” and personality traits as 
measured by the Bernreuter Personality Inventory and two of Maller’s 
character tests. Mott (932) found a coefficient of correlation of .66 between 
personality as measured by Marston’s Personality Scale and activity ratings 
derived from children’s drawings. Line and Griffin (925) used Thurstone’s 
multiple factor technic in analyzing the results from a battery of tests to 
find the factors underlying mental health. Two factors were found which 
separated the unstable from the stable. Factor one was probably related to 
“objectivity” of response; factor two was probably related to “fluency” or 
“mobility” of response. Brown (877) obtained a mean score on the Brown 
Personality Inventory of 24 for orphan boys from an institution. The mean 
score for the girls was 29. The mean score for boys living with their parents 
was 17 and that for girls 18. The mean score for boys living with their 
parents but having a low economic status was approximately the same, 23, 
as the mean score of boys from an institution. Schott (952) found marked 
differences between the means of three groups (200 normals, 130 applicants 
for professional positions, 300 neuropsychiatrics) on the Thurstone Per- 
sonality Schedule. The schedule, however, failed to show the degree of 
maladjustment. Terman and Miles (974) made extensive studies of the 
relationship existing between sex and personality by means of masculinity- 
femininity tests. 


Personality—Abnormal Persons 


A majority of investigators who have used the questionnaire type inven- 
tories of neurotic tendency, or the various free association tests, with reason- 
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ably large groups of unselected normals and groups diagnosed by clinical 
methods to be psychopathic, have found statistically reliable differences 
between the groups; but there always is too much overlapping for the 
measuring instruments to be relied upon to give a true prediction in the 
case of any given individual. 

Dimmick (889) used the Rorschach Ink Blot Test with 85 cases of 
dementia praecox of three clinical subtypes. Although several differences 
between the clinical subtypes approached statistical significance, the author 
thinks that the test needs more objective classification of the responses 
before it will be of great use in clinical work. This conclusion was generally 
reached by the many men and women who used the test during the period 
under review. Preda and Popescu (944) used the 100 word list of the Kent- 
Rosanoff free association test with 40 normals, 20 men and 20 women. 
paired with 40 insane men and women. The differences between the two 
groups ranged from 5 percent to 30 percent. It was also found that com- 
plexes are most important in the case of women. Preda, Stoenescu, and 
Cupcea (945) used Jung’s free association method with 30 mental patients. 
with the following results: the reaction time of manics was relatively short: 
a delayed reaction showed the existence of certain traumas and complexes. 
They concluded, however, that it was impossible to differentiate objectivel, 
between certain important preoccupations of the patient, his delusions, and 
complexes by this method alone. Heuyer and Courthial (912) found that 
a combination of a modification of the Woodworth-Mathews personnel data 
sheet and the Pressey X-O Test yielded results which were in agreement 
with careful psychiatric findings in 75 percent of the 114 cases studied. 
McNemar and Landis (927) gave a simplified form of the Willoughby 
Emotional Scale to 65 psychopathic women. There was no relationship 
between whatever the scale measures and age, educational status, or clinical 
diagnosis. Williams and Mendenhall (979) gave the Hull “Sway Test” for 
suggestibility four times to 100 epileptics. Eighty percent gave the same 
response all four times. The group had a much larger proportion of zero 
scores than normals. Stoenescu (964) gave the Toulouse-Pieron Attention 
Test to 148 insane people. The range of mean scores was from zero for 
idiots to 108 for the paranoia group. The normal score is given at 134. 
Page (936) studied the relationship between superstition and personality 
by means of a questionnaire and the Heidbreder and Neymann-Kohlstedt 
Introversion Tests in 50 manic-depressive cases, 50 dementia praecox cases, 
and 50 normals. Belief in 6 of the 25 superstitions was reliably more fre- 
quent in the psychotics than in the normals. No consistent relationship 
between introversion-extraversion and belief in superstition was found 
among the normals. 


Attitudes and Interests 


The Thurstone Attitude Scales have been used extensively during the 
three-year period under review. Attitude toward war has been a favorite 
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subject of study. Sowards (959) used Thurstone’s Attitude-toward-War 
Scale with high-school seniors, college freshmen, and college seniors in the 
same community. He found that the trend was consistently toward pacifism 
as more schooling was obtained but that the differences between the groups 
were not statistically reliable. Farnsworth (902) used the Peterson-Thur- 
stone Attitude-toward-War Scale and the Bernreuter Personality Inventory 
with the same group of college men in 1932, 1933, 1934, and 1936. There 
was no change in score at the end of the first year but in both 1934 and 
1936 there was a slight change in the direction of pacifism. He found no 
relationship between attitude toward war and scores on either the Bern- 
reuter Personality Inventory or the Thorndike Intelligence Examination 
for High School Graduates. M. Smith (957) used the Droba-Thurstone 
Attitude-toward-War Scale with University of Kansas elementary sociology 
students. There was a shift towards pacifism during the semester of five- 
tenths of a scale point. Women were more against war than men. Pihlblad 
(941) used the Peterson-Thurstone Attitude-toward-War Scale with 484 
men and women from one college and 100 men from another college. He 
found great unanimity of opinion. There was a definite piling up of the 
scores at the point in the scale which indicated mild opposition to war. 
He suggested that perhaps the scale is too insensitive to be used as a satis- 
factory measure of American college students’ attitudes toward war. Traxler 
(975) used the Droba-Thurstone Attitude-toward War Scale with high- 
school pupils. At this level the score was not related to educational level. 
Reliability coefficients of the scale ranged from .635 to .806. Doubt was 
expressed as to the validity of the scale for the measurement of attitude 
toward war of high-school pupils. Gardner (905) obtained evidence of 
change as a result of his teaching technic in a group of junior high-school 
pupils in attitude toward war as measured by the Peterson-Thurstone Atti- 
tude-toward-War Scale. Koga (924) gave the Peterson-Thurstone Attitude- 
toward-War Scale to 1,642 Japanese college students. Not many outstand- 
ing differences between the attitude toward war of Japanese and American 
students were found although the Japanese were consistently more favorable 
toward war than Americans. Stump and Lewis (967) had 80 ministers 
from several different denominations fill out the Droba-Thurstone Attitude- 
toward-War Scale. Five percent were described as neutral, the rest were 
strongly pacifistic. The older ministers were less extreme in their attitude 
toward pacifism. A coefficient of correlation between age and score on the 
scale of —.337 was found. 

Attitude (prohibition)—Knower (922, 923) used the Smith-Thurstone 
Attitude-toward-Prohibition Scale in a controlled experiment involving 
approximately 1,000 subjects—25 percent of the experimental group showed 
statistically reliable changes in attitude as a result of the presentation of 
oral arguments. He also found that there were only low relationships 
between attitude-influencing factors such as speeches, speakers, and interest 
and intelligence. Gardner (905) concluded that accumulative effects of a 


299 








REVIEW OF EDUCATIONAL RESEARCH Vol. VIII, No. 3 


lecture, a story, and a “chalk talk” about the use of alcohol with a group 
of college freshmen caused a change in attitude toward prohibition as 
measured by the Smith-Thurstone Scale. 

Attitude (social)—Eckert and Mills (900) studied the relationship be- 
tween international attitudes as measured by the Neumann Test of Inter- 
national Attitudes and scholarship and certain social factors. The inter. 
nationally-minded high-school senior had a higher scholarship rating than 
the nationalistically-minded senior. Religious affiliation and having an 
older sibling in college were more potent in determining international atti- 
tude than instruction in the social studies. Morgan and Remmers (931) 
found that there was greater liberalism in college students they studied in 
1933 and 1934 than those whom they studied in 1931. The Harper’s Social 
Study Questionnaire was used. The college students were found to be more 
liberal than their parents. 

Newcomb and Svehla (933) gave the Thurstone Attitude Scales toward 
church, war, and communism to 1,568 individuals in 558 families to de- 
termine the extent of covariation of the attitudes within families and the 
factors upon which such relationships may depend. The obtained correla- 
tion seems to justify the conclusion that the personal influence of family 
members upon each other is effective chiefly through the kind of institu- 
tional influences which they bring to bear upon each other. 

Attitude (toward the Negro)—Bolton (873) and Sims and Patrick (956) 
used the Hinckley Scale for measuring attitudes toward the Negro. Bolton 
obtained a reliability of .40 for the scale when used with southern students. 
These students showed no significant change in attitude as a result of in- 
creased knowledge of Negro education. In another experiment, Bolton 
(874) found that advanced students were more liberal in their attitude 
toward the Negro’s social rights than were freshmen. These southern stu- 
dents were least willing to recognize those rights connected with social 
intermixture of the races. Sims and Patrick made an interesting investiga- 
tion. One hundred and fifteen southern students in the University of Ala- 
bama were paired with 115 northern students in the same university. These 
groups were compared with a group of 97 students from Ohio University. 
The mean scores of the groups were 5.0, 5.9, 6.7, respectively. Length of 
attendance in college did not influence attitude toward the Negro but north- 
ern students attending the University of Alabama tended to become more 
prejudiced toward the Negro as they remained in the institution longer. 

Attitude (students)—Buck (879), Stagner (962), and Wilke (978) 
studied student attitudes by means of questionnaires and self-rating devices. 
Buck used a 375-item questionnaire on moral attitudes, anxieties, interests. 
etc. Approximately 2,000 students were studied during the ten-year period 
of the investigation. The author concluded that the evidence showed there 
was a tendency for student opinion to become more liberal. There was a 
lessening of disapproval of debt and of socialism. Stagner found that stu- 
dents were more liberal than their parents, especially the girls. When 
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parents disagreed on political party, boys tended to follow their father and 
girls the mother. Wilke studied student opinion in relation to age, sex, and 
general radicalism by means of an attitude scale of his own making. He 
found no conclusive relationship although women tended to be more radical 
than men. 

M. M. Smith (958), in an excellent study of comparative social attitudes 
based on four populations of 1,176 high-school seniors, 288 parents of 
these seniors, 192 of their teachers, and 83 university professors, found 
little evidence that social studies instruction was effective in developing 
intelligent opinion toward issues basic to citizenship, beyond that which 
the students shared with their parents. A partial coefficient of correlation 
between the pupils’ attitudes and the parents, with teacher or professor 
influence ruled out, was .61. The influence of the professors on the attitude 
of the seniors, when influence of parents and teachers was held constant, 
was indicated by .01. 

Effects of depression on attitudes—Probably the most thorough investi- 
gation of the effects of the depression on the attitudes of individuals was 
made by Rundquist and Sletto (949). Six scales designed to measure 
morale, feelings of inferiority, family adjustment, economic conservatism, 
attitudes toward law, and the value of education were made and adminis- 
tered to approximately 3,000 persons in a wide variety of economic and 
social strata. Differences in attitude toward the economic order were found 
between the employed and unemployed. In general, age, the fact of living 
at home or away from home, and the employment or non-employment of 
either or both parents were significantly related to attitudes. Men receiving 
relief were not characterized by feelings of inferiority. Peck and Beckham 
(938) compared the attitude toward relief of children between the ages of 
seven and fourteen from three relief groups and two non-relief groups. Six 
hundred and eighty-six children were used in the study. All groups, except 
the work relief group, showed attitudes more expressive of unwillingness 
than willingness to receive government aid. Stagner (961) studied the 
relationship between economic status and personality by means of the 
Wisconsin Scale of Personality Traits. Poverty had not improved the per- 
sonalities of the 128 college students studied. Children from homes of low 
economic status tended to develop feelings of inferiority, traits of nervous- 
ness or emotionality, and social passivity or seclusiveness. 

Miscellaneous studies—Rothney (948) used the Allport-Vernon Study 
of Values Test with high-school pupils. He found no significant relationship 
between scores on this scale and achievement in high-school subjects. 
Corey (884), in a carefully conducted study, used Miller’s and Yeager’s 
scales to evaluate attitudes toward teaching and professional training. 
Seventy-five college juniors and seniors in Wisconsin were used. A statis- 
tically significant difference was obtained between the means of the Sep- 
tember and January tests. The change was in the direction of improved 
attitude toward teaching. There was a coefficient of correlation of only .30 
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between the September and January tests. Corey (885) found a coefficient 
of correlation between attitude toward cheating as indicated by a question- 
naire whose reliability was reported as .91 and actual cheating of .02. The 
coefficient of correlation between actual cheating and temptation to cheat 
was .46. 

Interests—Walters and Eurich (977) used the Minnesota Interest Blank 
with 426 women students. A comparison of the interests of freshmen and 
seniors showed that there was a high degree of permanency of interests 
during the four college years. Dimmick (890) used Miner’s blank for the 
analysis of work interests with psychology students. Statistically reliable 
differences between the group which earned A and B grades and the group 
which earned D or E grades were obtained. 

Dunlap (896) used the Dunlap Academic Preference Blank to study 
the relationship between constancy of expressed preferences and such 
factors as achievement and intelligence. Degree of constancy was shown 
by degree of change during the period of ten months which elapsed be- 
tween two applications of the blank. Individuals varied from a constancy 
rating of 16 percent to 72 percent. The mean constancy rating was 54 per- 
cent. A positive relationship was found between degree of constancy and 
both achievement and intelligence. Symonds (972) asked boys and girls 
in two city high schools to rank fifteen major areas of life according to two 
specific points of view. Boys recognized money matters as productive of 
problems more than girls—on the other hand, girls stressed appearance 
and etiquette as productive of problems more than boys. 


Character, Behavior, and Delinquency 


Speer (960) investigated the extent to which the Bernreuter Personality 
Inventory could be relied upon to separate 58 children who were judged 
to have personality and character problems from 184 who were not problem 
cases. No significant differences were obtained. Mathews (928) used the 
Haggerty-Olson-Wickman Behavior Rating Scale to select two groups of 
thirty secondary-school boys. One group received the highest scores on the 
scale and the other was given the lowest out of 93 boys. Significant differ- 
ences between the groups were found for such factors as: strained relation- 
ships between the child and his parents, attitude toward authority, social 
manners, persistence, and reaction to frustration. The only significant 
difference between the groups on the six scales of the Bernreuter Personality 
Inventory was for sociability. Jones (919) used a battery of character 
tests and teachers’ ratings at the beginning and at the end of a year’s work 
in the social studies, with approximately 300 seventh- and eighth-grade 
subjects, to determine the efficacy of three specific methods of teaching 
character and citizenship. This important experiment should cause marked 
changes in the teaching of the social studies. Only the experiencing plus 
discussion method yielded gains consistently greater than the gains made 
by the control group. Little or no improvement took place with any of the 
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three methods when the testing and teaching situations were quite different 
from one another. Retests after six months showed that the gains had been 
fairly well maintained. 

Harriman (909) found a significant difference between the scores of 
epileptics and normals paired for educational status and chronological 
age on the Kohs Ethical Discrimination Test. The epileptic group made 
the lower score. 

Meyering (930) used the Haggerty-Olson-Wickman Behavior Rating 
Scale and the Woodworth-Mathews Personal Data Sheet with 100 boys in a 
camping situation. The Haggerty-Olson-Wickman Scale scores were related 
to the number of actual problems revealed in the camp situation. Baruch 
(870), in an excellently executed study found many significant relation- 
ships between the behavior maladjustments of 33 young children and 
reported tension between the parents. Keys and Guilford (920) made a 
very significant comparative study of the validity of the Bell Adjustment 
Inventory, the Bernreuter Personality Inventory, and the Personal Index 
of Loofbourow and Keys for predicting problem behavior in ninth- and 
tenth-grade children. Their conclusion was that “for no test or inventory 
are the correlations shown sufficient for accurate prediction of the behavior 
of individual pupils.” The Personal Index correlated higher (r — .41) 
with the criterion than any of the other inventories. 

Delinquency—The most thorough experimental study of delinquency 
which appeared during the period under consideration was directed by 
Healy and Bronner (910). One hundred and five delinquent children were 
matched with 105 non-delinquent siblings of as nearly the same age as 
possible. The two groups were compared for degree of adjustment to the 
family, emotional experiences, etc. Many important differences were found 
between the delinquents and the controls: the delinquent group had less 
desirable developmental histories than the controls; they also tended to 
be more restless and hyperactive; 91 percent of the delinquents, as con- 
trasted with 13 percent of the controls, had experienced pronounced 
emotion-provoking relationships with others. Hill (913) obtained ratings 
on 70 items of behavior for three groups, for example, 517 reformatory 
inmates, about 1,000 high-school pupils, and 148 adults. Nothing definitely 
symptomatic of delinquency was found. Durea (897) failed to find any 
significant relationship between ratings on the Furfey Developmental Scales 
and certain indicators of degree of juvenile delinquency. In another study, 

} Durea (898) found a highly significant difference (critical ratio = 7.2) 
; between a selected group of thirteen-year-old delinquents and a control 
group of similar age, on an adaptation of the Pressey Interest-Attitude Test. 
A statistically significant difference between a group of 115 delinquents 
and a group of 374 non-delinquents was also reporied. 

Pescor (940) gave the Neymann-Kohlstedt Test to 1,000 delinquents. 
A split-half reliability of .19 and retest reliability of .63 were obtained. 
4 The scores tended to be grouped in a neutral zone. Bartlett and Harris 
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(868) used a battery of tests in a controlled experiment to determine per- 
sonality factors in delinquency. Delinquents showed greater emotional 
instability, more difficulty in maintaining satisfactory home, family, and 
school relationships, more participation in socially undesirable leisure- 
time activities, and a greater tendency to cheat on classroom tests. Durea 
(899) also found that delinquents were more retarded emotionally than 
normals of the same age. Houtchens (914, 915), in two matched pair ex- 
periments where delinquents were matched with non-delinquents for 
chronological age, intelligence quotient, socio-economic status, and school 
grade placement, found that there was a coefficient of correlation of .645 
between conflict scores derived from a combination of the Kent-Rosanoff 
word association and Luria tension pressure technics, and delinquency. 
The distribution of conflict scores was found to be bi-modal. The two groups 
differed most in qualitative analysis and pattern response. 


Vocational Fitness and Guidance 


Studies of teachers—Sandiford and others (950) used the Bernreuter 
Personality Inventory as one of the measures in their excellent studies on 
forecasting teaching ability. They concluded that “ability in practice teach- 
ing is not measured by ‘personality tests’; in fact, considerable doubt exists 
at the present time as to what, if anything, these tests do measure. Measure- 
ment of personality traits is of no value if these traits are arbitrary and 
unrelated to other phases of human life.” Heilman and Armentrout (911) 
used the Purdue Rating Scale for Instructors in studying class reactions 
to twenty-three teachers at two different times five to seven years apart. 
Trait means for individual instructors were not highly reliable. Engelhart 
and Tucker (901) had 224 high-school pupils underline on a 100-item list 
of traits those of the best teacher. Traits receiving high ratings were: clear- 
ness in explanation, tolerance, sincerity, impartiality, and interest in pupils. 
Nogami and Sato (934) obtained ratings of an ideal teacher from 2,238 
pupils in the middle, normal, and higher schools of Japan. Character and 
sympathy stood highest, while cooperation and personal appearance were 
among the lowest. Yeager (984) studied by means of personality tests and 
questionnaires, the traits of high-school seniors interested in teaching. 
One hundred and nine seniors were compared with 500 unselected seniors. 
The weakest point of high-school girls interested in teaching, when com- 
pared with unselected girls, was personality. The weakest point of high- 
school boys interested in teaching was intelligence. Peck (939) studied 
the adjustment difficulties of three groups, for example, (a) 100 women 
teachers, (b) 26 men teachers, (c) 52 women students. The Thurstone 
Personality Schedule was used. The group of women teachers was less well 
adjusted than either the women students or the men teachers. 

Williamson and Darley (980) made the most complete and systematic 
evaluation of guidance work published during the three-year period. It is 
suggested that every personnel program include plans for evaluating its 
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work. One thousand two hundred and seventy-three items of information 
were collected for 196 individuals who had 784 problems. Between five 
and fifteen tests were given to each case in addition to those given during 
the entrance test program. Of the 784 problems the three most common 
groups of problems were: vocational, 300; educational, 227; and social, 
personal, and emotional, 136. The suggestions, advice, and recommenda- 
tions given to the students were classified and a follow-up study one or 
more years after was made. Satisfactory adjustment was made in 91 out 
of 94 cases who followed the advice which was given. Only three students 
out of 37 who did not follow the advice given at all made satisfactory adjust- 
ment. The authors caution readers against making careless generalizations 
from the results of this study. They promise a more extensive study of about 
1,000 cases. 

Salesmen—Husband (917) used the Wisconsin Scale of Personality 
Traits to study the differences in personality between 64 salesmen and 1,000 
college students. The salesmen were reliably less neurotic, more self- 
confident, and more self-sufficient than the students. There was only a 
slight correspondence between the efficiency ratings of the salesmen and 
the traits measured by the Wisconsin Scale, but the better salesmen tended 
to be less neurotic and more extraverted. Schultz (954) also found that 
extraversion was somewhat predictive of selling success and that dominance 
to a moderate degree and intelligence above the 20th percentile are related 
to the prediction of success in selling. Bills and Ward (872) gave the 
Strong Vocational Interest Blank and the Bernreuter Personality Inventory 
to 96 casualty insurance salesmen. They also found that high Bernreuter 
scores were significant for predictive purposes. Both high and low scores 
on the Strong Vocational Interest Blank were useful in prediction. Dodge 
(893) studied the differences between clerical workers and salespersons on 
the Bernreuter Personality Inventory. The salespeople possessed more 
social dominance than the clerical workers. 

Miscellaneous—Dimmick (891) found that some of the items in Miner’s 
Blank for analysis of work interests differentiated graduate engineers from 
college freshmen. McIlnay and Jensen (926) studied 538 student flyers; 
extraverts had a slightly better chance of being graduated. Quayle (946) 
studied 124 stenographers, 63 of whom reported that they were happy and 
61 that they were unhappy or doubtful of being happy in their vocation. 
The group filled out a 175-item questionnaire, the A-S Reaction Test, some 
of Maller’s Character Sketches, the Otis Test of Mental Ability, the Minne- 
sota Test for Clerical Workers, and the Strong Vocational Interest Blank 
for Women. Very few of the differences between the happy and unhappy 
groups were statistically significant. The happy group was different from 
the other on the following: as children they were not considered nervous, 
they were happy in school, deliberately chose business, and believe they 
have had a normally developed love life. 

The extent to which several measuring instruments are useful in voca- 
tional guidance and counseling has been studied by many investigators. 
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Strong (966) used his Vocational Interest Blank in a five-year follow-up 
study of Stanford University graduates. He found close agreement between 
the choice of an occupation made during college and the results of the 
interest test given five years later. There was also close agreement between 
the interest test scores made in college and those given five years later 
although half the group changed their occupation within five years of leav- 
ing college. 


Comment 


The writer believes there has been certain improvements in the studies 
covered during the 1935-1937 period as contrasted with those of the preced- 
ing period. There was a considerable increase in the number of studies 
which used experimental and control groups; the standard errors of the 
statistical measures employed were more frequently reported; and there 
were many studies which used large populations (200 or more), although 
relatively few of these presented convincing evidence of the extent to which 
the populations used were representative of some large socially significant 
group. 

The outstanding criticism of the studies under review, from the point of 
view of the “consumer” of research, is that, although the results are fre- 
quently significant for groups, the extent to which the conclusions apply 
to each individual of the group is not clearly indicated. Where the chief 
reason for making an investigation is to secure data that will aid in under- 
standing and giving help to an individual, statistical measures should be 
applied which will show the extent to which this aim has been attained. 

More studies are needed which cover a longer period of time, especially 
those dealing with changes which have taken place as a result of specifically 
described and controlled factors. In studies of the amount of change which 
takes place as the result of an experimental factor, measurement of ex- 
tent of permanency after six months or a year should be made. 

More emphasis should be placed by research workers on studies which 
may be validated in terms of socially significant behavior. The greatest 
of care should be taken to select a group as representative as possible of a 
large important population. By adherence to these two criteria educational 
research will be able to make a still greater contribution to our understand- 
ing of mental and social phenomena. 
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CHAPTER VIII 


Developments in Statistical Methods Related to Test 
Construction’ 


EDWARD E. CURETON and JACK W. DUNLAP 


Eucattonat STATISTICS have made considerable progress in the last few 
years. Only a short time ago the competent mathematicians interested in edu- 
cational test theory and the competent educators conversant with even a 
minimum amount of mathematics could almost have been counted on one’s 
fingers. Today there are dozens in each of these groups, and there is every 
reason to believe that in a few more years there will be hundreds. With the 
founding of the Psychometric Society in 1935 and the publication of the 
first issue of Psychometrika in 1936, the allied field of mathematical psy- 
chology may fairly be said to have become a recognized science. 


Scoring and Computing Aids 


The computational field divides naturally between the Hollerith methods 
and those which employ standard calculating machines. For the first group 
the handbook edited by Baehne (988) is already a classic. For the second, 
Dunlap’s manual (1011) provides complete and detailed instructions. 

Cuff (1003) devised a Testometer which scores a perforated answer sheet 
by weighting a set of plungers that drop through the correctly punched 
holes. The International Test Scoring Machine is emerging from the ex- 
perimental stage (1050). When it becomes a little more generally available 
it should effect a revolution in mass scoring and in item analysis. New issues 
of the Strong Vocational Interest Blank and of nearly all the Cooperative 
Achievement Tests are being revised to use its special answer sheets. 

A number of tables and nomographs to facilitate the computation of 
item-test correlations have appeared. The most generally useful of these, 
probably, are Dunlap’s table of p/z (1012) and Kuder’s nomograph 
(1064). Arnold and Dunlap (987) have also prepared a nomograph for 
computing Spearman-Brown correlations and their standard errors. 


Seales and Scaling Methods 


Chi (998) succeeded in measuring separately the reliability, “internal” 
validity, and halo effects in personality ratings. After correcting for halo he 
found that a single general factor plus specifics was enough to account 
for the intercorrelations among nineteen rated traits. He identified his gen- 





1 Bibliography for this chapter begins on page 357. 
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eral factor tentatively with Webb’s “volition.” Flanagan attempted to de- 
vise a scale unit for the Cooperative Achievement Tests that would be con- 
stant from test to test and from age to age (1002). Bradway and Hoffeditz 
(993) translated the important parts of Heinis’ original paper on the per- 
sonal constant. They also described the Vermeylen Scale on which it was 
based, and added a number of penetrating comments and criticisms of their 
own. Cureton (1004) pointed out the real meaning of the A.Q., repeated 
and extended Huffaker’s derivation of its standard response error, and de- 
vised corrections for its systematic errors. Richardson and Stokes (1078) 
tested all the children in an English town (12,000 in number) at once, and 
confirmed Thurstone’s finding that absolute variability in intelligence in- 
creases regularly with age, the standard deviation at any age and on almost 
any test being close to .18 times the absolute mean. Lorge (1067) discoy- 
ered that all of the decreases in intelligence test scores with advancing age 
could be attributed to the speed factor. 

The dividing line between mental test theory and psychophysics is rap- 
idly disappearing. Guilford (1035) showed that the difficulty of a test item 
from the Seashore Tests of Musical Talent is inversely proportional to the 
logarithm of the stimulus-magnitude, and suggested that the unit of absolute 
scaling as applied to test items might become a satisfactory unit for psycho- 
physical problems also. His recent text (1034) started with a consideration 
of psychophysics and worked from there toward mental test theory. Barn- 
hart (989) applied pair comparison, order of merit (ranking), and single 
judgment (like-dislike) methods to the scaling of affective judgments on 
simple geometric forms, and found no significant superiority of the more 
complicated and time-consuming methods over the method of single judg- 
ments. Guilford (1036) found it possible to scale a series of items when each 
of a large number of judges had expressed only his first choice and his last 
choice. This method opens up the vast field of ordinary voting preferences 
to the methods of psychophysics. F. W. Irwin (1051) reviewed the litera- 
ture of this difficult field admirably, and Woodworth’s long-awaited mono- 
graph (1116) appeared in 1936. 

In the related field of the theory of frequency distributions, Zoch (1120) 
pointed out that in all of the bell-shaped Pearson curves, the points of 
inflection are equidistant from the mode. This is a serious limitation in the 
case of skew curves, and often causes them to fit experimental data rather 


badly. 


Correlation and Regression 


Deming (1006) in a remarkable series of papers, generalized the theory 
of least squares, and pointed out that the validity of the chi-square test of 
goodness of fit depends on the assumption that the fitting was done by 
least squares. Hotelling’s epochal study (1047) laid the foundation for an 
almost infinite variety of new extensions of the theories of correlation and 
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factor analysis. Bernstein (991) opened up the field of the theory of least 
absolutes again, and this may well become a legitimate competitor of the 
theory of least squares. 

Hotelling and Pabst (1046) and Friedman (1026) provided exact tech- 
nics for tests of significance based on ranks. These tests avoid the assump- 
tion of normally distributed variates, and apply to most of the field prev- 
iously covered by normal correlation and the analysis of variance. 

A number of improved methods for computing partial and multiple cor- 
relations and regressions have appeared recently. McIntyre (1069) system- 
atized the computations from raw data, Griffin (1033) simplified his pre- 
vious simplification of Yule’s method, and Mosak (1071) improved on 
Horst’s generalization of the Doolittle method. Dwyer and Meacham (1014) 
reported a method for printing a correlation table on a Hollerith tabulator 
equipped with digit selection, provided one variate contains not more than 
ten categories. The tabulator provides marginal totals and cumulative (pro- 
gressive) totals at the same time, from which means, standard deviations, 
skewness coefficients, and the correlation coefficient can be computed read- 
ily. The method can also be used to obtain the cumulative frequencies (pro- 
gressive totals) of all columns at once, thus facilitating the computation of 
medians, quartiles, deciles, means, and standard deviations of ten variables 
simultaneously. 

Stouffer (1095) showed how to correct partial and multiple correlations 
for attenuation. Cureton (1005) discussed the relations between experimen- 
tal setups and reliability computations, and derived the standard errors 
of a number of commonly used correlation functions. T. L. Kelley (1058) 
presented a new formula for a correlation ratio unbiassed by the arbitrary 
division of the data into categories. 

Several recent papers have dealt with the perplexing problem of weight- 
ing several criteria to obtain a composite. Hotelling (1045) weighted them 
in such a manner as to minimize the mean square error of prediction by a 
definite battery of independent variates. Edgerton and Kolbe (1015) 
weighted them so as to minimize the mean square variation of the several 
standard-score criterion measures of each individual, without reference to 
any independent variates; and showed that this method also resulted in 
maximizing the standard deviation of the composite scores. Kurtz (1066) 
did not combine his criterion-scores at all, but showed how to weight two 
independent variates so as to maximize their average intercorrelation with 
several separate criteria. 

Conrad (1001) and Ghiselli and Kuznets (1030) derived formulas for 
computing the correlation between a subtest and the remainder of the bat- 
tery, or between an item and the remainder of the test. Dunlap (1010) de- 
rived formulas for obtaining the correlation in a large group, knowing the 
means, standard deviations, and correlations in several subgroups. Baten 
(990) solved a similar problem involving the skewness coefficient. 
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Reliability 


Willoughby (1114), Sandon (1082), Goodenough (1032), Thouless 
(1099), Jellinek (1054), and Gulliksen (1037) discussed at some length 
the problem of the essential meaning of reliability. The varieties of fluctua. 
tion and error comprehended within the term include at least the following: 


item sampling error 


item weighting error cost qarece 


motivational fluctuation 
function fluctuation response errors 


subjectivity error 
halo effect reader errors 


clerical error 


Various corrections and remedies have been proposed. Willoughby would 
eliminate items that differ only verbally from other items without calling 
for an actual reorientation of the examinee’s thinking. Goodenough found 
that the odd-even correlation, “stepped up” by the Spearman-Brown 
formula, actually exceeded the test-retest correlation in several cases. 
Thouless described an experimental and statistical procedure for measuring 
function fluctuation. Jellinek preferred to use the intraclass correlation as 
a reliability coefficient. Gulliksen showed how to measure reader reliability 
and test reliability separately in an essay examination. Kuder and Richard- 
son (1065) gave an extended analysis of test reliability in terms of the 
difficulties and intercorrelations of the items. Stephenson (1090) pointed 
out that for many purposes the saturation of a test with a general factor is 
more important than its reliability. This is of course only a re-emphasis on 
the priority of validity. Conrad and Martin (1000) dealt with the same prob- 
lem in another way when they derived an “index of forecasting efficiency” 


(E,, = 1 — V1 —1,2) corrected for attenuation in the criterion. 
01 


Sampling Theory 


The theory of small samples, involving as it does a great deal of mathe- 
matics beyond elementary calculus, has often proved a stumbling block to 
those whose backgrounds are primarily non-mathematical. These latter wil! 
welcome especially a paper by Jackson (1053) giving elementary deriva- 
tions of several fundamental formulas, as well as two descriptive papers by 
Wilks (1112, 1113). The use of the more exact tests of significance has been 
extended in a series of brilliant papers by Fertig (1018, 1019, 1020, 1021). 
Dorfman (1009), Neyman and Tokarska (1074), Ricker (1079), and 
others. Conrad and Krause (999, 1063) extended the extreme tails of the 
Kelley-Wood table, and provided a new table of the normal probability 


integral giving values of x/pe for given areas. 
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The Analysis of Causation 


Wright (1117) has given a new and complete summary of his important 
method of path coefficients. Johnson and Neyman (1056) and Kimball 
(1061) devised technics for determining the significance of the difference 
between two regression equations, making experimental pairing and equat- 
ing of groups unnecessary, and increasing the amount of information on 
which the test of significance is based, thus generalizing the problem of 
matched groups. 

Sterne (1093) solved approximately the problem of the distribution 
function for Rhine’s extra-sensory perception experiments. Jellinek (1055) 
deplored the misuse of statistics by psychologists and psychiatrists, and 
reiterated the point that even good statistics do not improve bad data. 
Snedecor and Cox (1087) presented a number of new technics in the 
analysis of variance. 

The general problem of the probability of correct matching has received 
renewed emphasis through the work of Vernon and others on personality 
estimates. It has been shown, for example. that given general verbal char- 
acter sketches and photographs of unknown persons, a group of judges can 
match them correctly with a frequency far above chance—even though 
these same judges may fail to rank the pictures on any given trait in such a 
manner as to obtain a correlation with the correct ranking significantly 
above zero. Vernon (1108) reviewed most of this work. The mathematical 
problems and certain of their implications were considered by Chapman 


(996, 997) and by Vernon (1109). 


Item Analysis and Validity 


Merrill (1070) objected to the indiscriminate item analysis, and devised 
a modification of the chi-square technic to show whether or not a whole 
table of item-test correlation differences could be attributed to chance. 
Sletto (1086) showed that a population of at least 400 is necessary to 
determine stable item-discrimination values, and that items which are highly 
discriminative in one group may be non-discriminative in another. Mosier 
(1072) discovered eight independent factors among a set of items picked 
from a much larger set on the basis of high discriminating power. In the 
light of these findings, a number of studies dealing with the relative merits 
of different item-selection technics must be considered as of secondary 
importance. A possible exception is Horst’s new method (1042), which is 
based on the item intercorrelations as well as on their correlations with the 
total test and with an outside criterion. 

A number of writers have dealt with the problem of key-correlations in 
multi-trait tests such as the Strong Vocational Interest Blank and the Bern- 
reuter Personality Inventory, and considerable controversy has arisen. 
The arguments were reviewed by Flanagan (1025), one of the principals in 
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the controversy, and the opposite viewpoint was presented in purely theo- 
retical terms by Royer (1081). 

Dickinson (1008), in a thought-provoking manner, discussed the essen- 
tial meaning of validity, and the methods by which it is obtained. 

The practical problems of item construction have been attacked by several 
investigators. J. M. Stalnaker and R. C. Stalnaker (1089) found that 
selected distractors (wrong alternatives) improved the validity of a mul- 
tiple-choice vocabulary test, and V. H. Kelley (1059) found the “arm-chair” 
judgment of the test builder just as good as a laborious tabulation of errors 
on a recall test for producing such selected distractors. Votaw (1110) 
demonstrated that the use of do-not-guess instructions and the rights-minus- 
wrongs formula with a true-false test penalized the intelligent but sub- 
missive student unduly, and recommended the use of instructions to mark 
every item. Bird and Andrew (992) found by the criterion of internal con- 
sistency that one-word-completion items were more valid than recognition 
items, even though these recall items scarcely ever formed over one-fourth 
of the criterion-test. Feinberg (1017) made an intensive study of the re- 
sponses to one word (mellow) from the Stanford-Binet vocabulary. He 
found that it was misplaced by six years from its correct difficulty level, 
that success in defining it was related to sex and age when mental age was 
held constant, and that on the credit side it has had a uniform meaning 
throughout its history. 

Methodological studies have not been lacking. Remmers and associates 
(1077) successfully constructed generalized attitude scales by using a 
variant of the Thurstone technic. Kirkpatrick and Stone (1062) objected 
to the assumption of the attitude continuum and the method of equal-ap- 
pearing intervals. Instead they preferred to define each item as pro or con 
in constructing a scale of attitude toward religion, and to group the items 
in logical categories. This led them to the development of a belief-pattern 
scale, which was shown to possess considerable discriminating power. 
Dunlap (1013) constructed items somewhat similar to Strong’s, but with 
four responses (like, indifferent, dislike, and unknown), to measure chil- 
dren’s interests in school subjects. These items were prepared with relation 
to specific subjects and validated against the appropriate subtests of the New 
Stanford Achievement Test. A few highly discriminative items were then 
revalidated against other subtests, where they often proved useful. He con- 
cluded that multiple weighting would be distinctly advantageous in reducing 
the length of the test. Young (1118) derived a residual index (RJ = Y — 
by» X). With Estabrooks (1119) he used residual scholarships (test-intelli- 
gence constant) as a criterion in developing a studiousness-scale key for the 
Strong Vocational Interest Blank at Colgate University. Mosier (1073) 
used the scale on a different kind of a group (one that included technical 
students) at the University of Florida, and found it to be limited in its 
validity. 
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Several new studies on the scoring of the rearrangement test have ap- 
peared. The most serviceable scoring formula to date is that of Sims (1085) : 
Score = n — 3d/n, where n is the number of items and d is the sum of the 
differences between the student’s ranks and the key ranks. All negative 
scores are given the score zero. 


Factor Theory 


This has been the dominant subject in educational and psychological 
statistics during the last few years. Spearman’s two-factor theory and Thom- 
son’s sampling theory, though eclipsed from time to time by the spectacular 
successes of rival theories, have continued to progress. J. O. Irwin (1052) 
showed that the indeterminacy in g could be reduced indefinitely by in- 
creasing the number of tests; Piaggio (1076) devised a method for de- 
termining g and s approximately; and Thomson (1098) showed how these 
estimates could be improved through the use of certain non-heirarchical 
systems. Thomson (1097) also presented a complete treatment of his 
sampling theory and concluded that g is a useful mathematical description 
but is not a psychological reality. 

Thurstone (1104) published a definitive statement of his multiple factor 
theory, which has become the basic starting point for most of the later 
work in this field. His Primary Mental Abilities (1102) has just come from 
the press—too late for review here. Hotelling supplemented his classic 
paper (1044) by another (1048) presenting a simplified scheme of compu- 
tation. T. L. Kelley (1057) described a variant of this technic and went 
on to discuss the fundamental problems of factor analysis in a cogent and 
penetrating manner. Holzinger (1041) developed his bi-factor theory, a 
theory intermediate between those of Spearman and the unrestricted mul- 
tiple factorists, and then proceeded to embody it in the only treatment of 
factor theory which to date deserves the name “clear.” Burt (995) devised 
a method somewhat similar to those of T. L. Kelley and Hotelling, and 
worked out comparisons between these methods and Thurstone’s. Woodrow 
and Wilson (1115) described a variant of the centroid method which 
avoids the necessity for the rotation of axes demanded by Thurstone and 
gives meaningful factors from the outset. Horst (1043) proposed to use a 
method of computation proceeding directly from the score matrix, and giv- 
ing all significant factors at once. Factors having small loadings could be 
discarded after the analysis was complete. 

Frisch (1027) working entirely independently of the American and 
British groups, and upon a somewhat different problem, arrived at results 
whose resemblance to those of the factor theorists is startling. Everyone 
interested in the deeper implications of the theory should read his mono- 
graph. 

Tryon (1106) proposed to discard “mathematical factors” entirely and 
to substitute “psychological factors.” 
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A host of theoretical papers have appeared, seeking to clarify specific 
points and less commonly to point out relationships between the various 
theories. Only a few can be cited. Holzinger and Harman (1040) compared 
the bi-factor method with several variants of Thurstone’s method, partly 
by mathematical analysis and partly by means of a worked-out example. 
Kellogg (1060) showed the relationships between Thurstone’s early least- 
squares-iteration method (later discarded by Thurstone in favor of the 
centroid method ) , Hotelling’s method, and Kelley’s. His analysis was partly 
mathematical, partly computational, and partly, it would almost seem. 
intuitional. It checked, however, with the rigorous analysis of Girshick 
(1031), who showed that the principal components of Hotelling’s method 
are maximum likelihood statistics and therefore subject to smaller sampling 
errors than any others, and that the method of principal components does 
not require as many factors as tests, and can be used just as well with com- 
munalities in the diagonals as with reliabilities or unities. From all this 
work it would appear that at present the most efficient factor-analysis 
method would start out with a matrix of raw correlations having estimated 
communalities in the diagonal cells, proceed by Hotelling’s shorter method 
to a determination of the principal components, and end up with rotations 
as suggested by Thurstone. The research worker who is willing to sacrifice 
a little accuracy in order to avoid the admittedly large amount of work 
involved in the above recommendation would probably do well to consider 
the direct and simple method of Woodrow and Wilson (1115). The bi- 
factor method should always be considered as an alternative if the data 
seem likely to conform to its somewhat more restrictive assumptions. It is 
impossible as yet to evaluate Horst’s new method, though a paper by Hoel 
(1038) provided a method for obtaining the necessary advance informa- 
tion regarding the number of factors that will be required. 

The number of experimental investigations which have employed the 
factorial methods now runs into the hundreds. Only a few can be mentioned 
here at all. Thurstone (1100) applied 56 tests to each of 240 college stu- 
dents, and isolated 7 primary traits with some certainty and 5 others with 
some uncertainty. His latest work was not published in time for comment 
here. Holzinger (1039), working with nearly 100 tests and two groups of 
children aggregating about 1,100, found a general factor and 7 group 
factors. His group factors clearly resemble Thurstone’s primary traits. 
though they are not identical. Holzinger’s data are published in a series of 
pamphlets in such a manner as to permit any other worker to check his 
analyses or to apply other analyses to the same data. Mosier (1072), as has 
already been noted, analyzed 39 out of 42 of the most significant items 
from the Thurstone Neurotic Inventory, and found 6 clear factors and 2 
more which were not so clear. Flanagan (1024) analyzed the 4 scales (not 
the items) of the Bernreuter Personality Inventory, and found 2 principal 
components for which he devised new scoring keys. His monograph pre- 
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sents a number of additional experimental and statistical results relating 
to the methodology of test construction. 

Stephenson (1092), following up a suggestion of Thomson’s, proposed 
to invert the factor technic by applying a large sample of tests to a small 
number of individuals. He would then compute the correlations between 
individuals and apply a factor analysis. Just as a test-factor is named by 
naming the tests in which it has high loadings, so an individual-factor 
would be named by naming the persons in whom it had high loadings. 
This analysis would lead, as the number of tests was increased, to a valid 
description of human types. Stephenson (1091) has carried the theoretical 
implications of this idea further in another paper. 

Zubin (1121) proposed to score, not item responses, but patterns of 
item response. After selecting 140 items from among 632 for their ability 
to differentiate psychotics from normal individuals, he combined them in 
groups according to logical considerations, and scored them by means of 
contingency tables. Some groups of diagnostic items turned out to be 
non-discriminative as groups, while others turned out to be super-discrimi- 
native. On rather slender evidence he suggested that the optimum group 
should contain about five items. 

Spearman (1088) defended the Taylor series approximations inherent 
in all mental measurement theory, and proved that in the case of factors 
which combine by multiplication instead of by addition, the two-factor 
hypothesis leads exactly to the tetrad criterion again. 


Miscellaneous 


The fundamental questions of scientific method, operational definitions. 
implicit and explicit assumptions, and the nature and meaning of measure- 
ment have been discussed by McGregor (1068), Scates (1083, 1084), 
Stevens (1094), and Thurstone (1103). 

Burgess (994), in combining the results of 60 published studies to obtain 
two sets of composite growth curves for the heights of boys and girls, 
demonstrated the meaning of synthetic research in its highest form. 

Hyde (1049) and Dickey (1007) have investigated the statistical con- 
cepts necessary for the reading of educational literature. Neither of their 
lists is very long, but both of their samplings of “educational literature” 
are rather narrow. 

Since Toops and Kuder (1105) covered the field of the present review 
in 1935, Swineford and Holzinger (1096) have published a short but well- 
selected and annotated bibliography every year. Rider (1080) published 
a review of the more technical papers in mathematical statistics just pre- 
vious to the appearance of the Toops and Kuder review. 

A considerable number of new books on statistical methods have ap- 
peared recently. Those of most interest to educational statisticians will prob- 
ably be (in approximate order of increasing technicality) the ones by Gar- 


rett (1028), Enlow (1016), Guilford (1034), Fisher (1023), Peters and 
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Van Voorhis (1075), and Gavett (1029). Walker and Durost (1111) have 
prepared a useful manual on the construction of tables. Uspensky’s /ntro- 
duction to Mathematical Probability (1107) and Fisher’s Design of Experi- 
ments (1022) became classics in their respective fields upon publication. 


Summary 


The principal statistical developments of the last three years have been 
the following: 

1. New aids for handling large masses of data with respect both to scor- 
ing and to computation. 

2. An increasing tendency for psychophysics and the mental test theory 
to merge. 

3. Further improvements in the theory of least squares, and the begin- 
ning of a systematic theory of least absolutes. 

4. A critical attitude toward the concepts of reliability, validity, and 
item discrimination, and attempts to redefine and clarify these concepts. 

5. Serious attempts to solve the problem of the combination of criteria. 

6. The application of modern sampling theory and tests of significance 
to educational and psychological problems, especially to the problem of 
matched groups; and the revival and improvement of methods of analysis 
based on ranks. 

7. A revival of interest in the chi-square and contingency methods, and 
a number of new applications of these methods. 

8. A general attack on the problem of human types, through the develop- 
ment of the matching technic, the inverted factor theory, and the methods 
of pattern analysis. 

9. Several new attacks on the problem of factor analysis, and the appli- 
cation of factorial methods to a wide variety of experimental data. 

10. Attempts to restate the fundamental hypotheses of an educational 
and psychological science. 


Needed Research 


Much new research is needed in furtherance of the trends noted above, 
and a few important problems have been neglected during the last three 
years and should be given consideration again. The most important needs 
would include: 


1. Further development of scoring machines, and the evolution of some 
plan for their more general distribution. 

2. Further study of mental growth, both before and after maturity, in- 
cluding a determination of the actual mental ability of the average young 
adult. 

3. A critical reexamination of score scales and the M.A., 1.Q., E.Q., 
P.C., and A.Q., taking accurate studies of mental growth as the starting 


point. 
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4. Systematic exploration of the theory of least absolutes. 

5. Research with large samples and homogeneous tests along the lines 
suggested by recent criticisms of the concepts of validity, reliability, and 
discrimination. These studies should determine, among other things, the 
absolute and relative magnitudes of the test errors and the response eirors 
in mental tests. 

6. Further studies on the combination of criteria, and the linking up of 
the findings with those in the allied field of index numbers. 

7. The more general use of modern sampling theory and exact tests 
of significance. This will require much careful expository work to make 
the technics more generally available; and may in fact imply no less than 
a total abandonment of the idea of non-mathematical “elementary statistics.” 

8. Further studies of human types, using improved contingency-matching 
methods, inverted factor methods, and pattern analysis methods. It is pos- 
sible that the basic unit of analysis may have to be the test item rather than 
the score. This will require still more efficient mechanical scoring aids and 
wholesale computation devices. 

9. Synthesizing studies in the field of factor-analysis. The interrelations 
of the factorial methods must be explored more fully than they have been. 
The selection of a factor method should eventually cease to be a matter of 
argument, becoming simply a matter of picking the method whose rationale 
best fits the limitations of a given set of data. 

10. Extension of the work on matched groups and the significance of the 
difference between regression equations to multiple-group comparisons and 
to multiple criteria. 

11. Intensive study of the single test item in a wider variety of situations. 

12. Further work on multi-test scales, leading possibly to a single scale 
to be scored for all interests, attitudes, adjustment trends, etc., which can 
be measured by such checklists in the first place; this work to be preceded 


by a careful analysis of the inherent reliability, validity, and significance 
of such lists. 
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References in this Index are to the beginning of chapters and their subdivisions, and 
to the first allusion in running (perhaps intermittent) discussions. One should scan 
several pages following the one cited. 


Ability grouping, 251 

Acceleration, 251 

Accidents, 266; see also safety, emotions 

Accomplishment quotient, 244, 250, 308 

Adjustment, surveys, 293 

Aptitudes, see predicting school success, 
vocational aptitude, particular subject 
field 

Arithmetic, development of concepts, 236 

Art, aptitude, 263 

Athletic ability, see motor abilities 

Attitudes, measurement of, 269, 276, 312; 
surveys of, 298; see also racial prej- 
udices, religious attitudes 

Automobile driving, 263; see also safety 

Aviation, aptitude, 265 


Birth rates, 243 


Causation, 310; see also correlation 

Character, measurement of, 269, 302: see 
also moral behavior 

Cheating, 285 

Child psychology, see infants, preschool 
children 

Clerical aptitudes, 257 

College success, prediction of, 248 

Correlation, of abilities, 225, 235, 244; 
statistics, 308; see also causation 


Delinquency, 254, 303 
Development, see genetic studies 


Emotions, and accidents, 266; and driving, 
263; development of, 237; variability, 
293 

Environment, and intelligence, 235, 241: 
and personality, 238; and school suc- 
cess 248 

Eugenics, 243 

Experiments, 310; on infants, 229 


Factor analysis, 231, 239, 257, 286, 297, 
307, 312 
Family relationships, 293 


Genetic studies, see emotions, develop- 
ment of; growth; infants; language, 


development of; mental development; 
personality, development of; physical 
development; play; preschool children; 
social adjustment and behavior 
Graphology, 289 
Graphs, see nomographs, profiles 
Growth, curves, 235; see also genetic 
studies for cross references 


Handedness, 234, 265 

Handwriting, see graphology 

Home, see environment 

Homogeneous grouping, see ability group- 
ing 


Infants, 229; see also preschool children 

Intelligence, and clerical ability, 257; and 
delinquency, 254; and driving, 264; and 
health, 244; and mechanical ability, 
259; and moral behavior, 245; and 
music, 245; and occupations, 242, 251; 
and personality, 245; and physical de- 
fects, 253; and school success, 247; and 
sex differences, 244; surveys of, 246; 
testing programs, 246; see also accom- 
plishment quotient, environment, racial 
differences 

Intelligence tests, applications of, 241; 
for infants and preschool children, 229; 
group, 223; incentives, 225; individual, 
221; non-verbal, 226-230 

Interests, 275; surveys, 298 

Interviews, 279; as tests, 222, 265 

Item analysis, 307, 311 


Judgments, 281; see also rating 
Language, development of, 236, 254 


Marital relationships, 292 

Measurement, incentives, 225; needed re- 
search, 316; unit of, 308; see also factor 
analysis, rating, reliability, scaling, tests 
and scales, validity, particular subject 
field 

Mechanical aptitudes, 258 

Memory, 226, 236 

Mental development, see intelligence, ge- 
netic studies for cross references 
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Mental tests, see psychological tests 

Moral behavior, 245, 302; see also cheating 

Motion pictures, see photographic record- 
ing 

Motor abilities, 226, 233, 244, 259; see 
also physical development 

Music, and intelligence, 245; aptitude, 
262; measurement of, 231; psychology 
of, 237 


Nationality and intelligence, see racial 
differences 

Nature and nurture, see environment 

Needed research, intelligence testing, 256; 
measurement, 316; personality study, 
217; psychological tests, 217; reporting 
intelligence scores, 250 

Negroes, see racial differences 

Nomographs, 307 


Observation, 231, 237, 239; of behavior, 
289, see also photographic recording 
Occupations, and fecundity, 243; and in- 
telligence, 242, 251; see also environ- 

ment, vocational aptitudes 


Parent-child relationships, 277 

Personality, and delinquency, 255; and in- 
telligence, 245; development of, 237; 
measurement of, 269; needed research, 
217; rating, 281; studies of, 229; sur- 
veys, 292; see also adjustment, psycho- 
logical tests and cross references, rating 

Photographic recording, 233, 237, 240 

Physical defects, 244 

Physical development, 244, 315; see also 
motor abilities, genetic studies for cross 
references 

Pictures, 237, see photographic recording 

Play, 231, 239 

Prediction of school success, 247, 275, see 
particular subject field 

Preschool children, 229, see also infants 

Professional aptitudes, 261 

Profiles, occupational, 266 

Psychological tests and scales, 308; see 
also aptitude, attitudes, character, in- 
telligence, interests, moral behavior, 
needed research, personality, social be- 
havior, surveys, vocational aptitude. 


Racial differences, 230, 234, 237, 243, 295 

Racial prejudices, 279, 300 

Rating, of personality, 281, 307; scales, 
231 
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Reading, errors in, 236; interests, 276: 
readiness, 247 

Reasoning, 236 

Records, 291; pupil, 246; test, 249; see 
also photographic recording 

Reliability, 309 

Religious attitudes, 279 


Safety, 263, see also accidents, automobile 
driving 

Sampling, 239, 310; see also statistical 
methods, tests of significance 

Scales, see scaling, tests and scales, par- 
ticular subject field 

Scaling, 307; see also measurement, scor 
ing, unit of 

Scholastic aptitude, see predicting schoo] 
success 

Scoring, 307, 311 

Sex differences, in intelligence, 244; in 
music, 262 

Social adjustment and behavior, 238: 
measurement of, 231 

Speech defects, 255 

Statistical methods, 307; tests of signifi- 
cance, 310; see also correlation, factor 
analysis, nomographs, sampling, tabulat- 
ing machines, tests, weighting 

Surveys, intelligence, 246 


Tabulating machines, 307, 309 

Teachers, traits, 304 

Tests and scales, construction of, 311: 
see also item analysis, measurement, 
psychological tests, scoring, particular 
subject field 


Unit, see measurement 


Validity, 311 

Variability, between abilities, 241; be- 
tween individuals, 241, 258; in oc- 
cupational groups, 267; in performance, 
264; in personality measurements, 271; 
see also ability grouping, reliability, 
sex differences 

Vital statistics, see birth rates, eugenics 

Vocabulary, and intelligence, 248, 253; 
of preschool children, 231; test, 231 

Vocational aptitude, 257, 304; see also 
predicting school success, professional 
aptitudes 

Vocational guidance, 304 


Weighting, 309; 312 
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